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PREFACE 


This text contains ample material for aone term precalculus introduction 
to probability theory. It can be used by itself as an elementary introduc- 
tion to probability, or as the probability half of a one-year probability— 
statistics course. Although the development of the subject is rigorous, 
experimental motivation is maintained throughout the text. Also, 
statistical and practical applications are given throughout. 

The core of the text consists of the unstarred sections, most of 
Chapters 1-3 and 5-7. Included are finite probability spaces, com- 
binatorics, set theory, independence and conditional probability, 
random variables, Chebyshev’s theorem, the law of large numbers, the 
binomial distribution, the normal distribution and the normal approxi- 
mation to the binomial distribution. The starred sections include limiting 
and infinite processes, a mathematical discussion of symmetry, and 
game theory. These sections are indicated with an*, and are optional 
and sometimes more difficult. 

| have, in most places throughout the text, given decimal equivalents 
to fractional answers. Thus, while the mathematician finds the answer 
p= 17/148 satisfactory, the scientist is best appeased by the decimal 
approximation p = 0.119. A decimal answer gives a ready way of find- 
ing the correct order of magnitude and of comparing probabilities. 
Also, in applications, decimal answers are often mandatory. Still, since 
17/143 is only equal to 0.119 to three places, one is confronted with the 
problem of notation. Should one write 17/143 ~ 0.119, or 17/143 = 
0.119—, or 17/1438 =0.119 (three places)? This author simply wrote 
17/143 = 0.119. The reader must therefore be prepared for the conven- 
tion (well established in common usage) that an equation involving 
decimals is, in most cases, only an approximation. 

| wish to acknowledge my debt to some of the people who helped in 
the production of this text. Mrs. Eleanor Figer did the typing of the final 
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manuscript. E. Sherry Miller, my student, worked the problems and grew 
to appreciate the value of the slide rule. Leonard Hausner, my assistant 
and son, performed the probability experiments. Finally, thanks is 
hereby given to my class A63.0004 (Introduction to Probability) for 
submitting to the first dittoed version of the text and for criticizing and 
correcting much of it. 
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CHAPTER 1 THE 
FOUNDATIONS 


INTRODUCTION 


Probability theory is a branch of mathematics concerned with the measure- 
ment and understanding of uncertainty. Historically, this theory came into 
being to analyze certain games of chance. But it is clear that uncertainty 
occurs not only in gambling situations but all around us. When we ask what 
the maximum temperature will be in Chicago next July 17, or how many 
traffic fatalities will occur in New Jersey on the Memorial Day weekend, 
most people will agree (before the event occurs) that there is an element of 
uncertainty to the answer. An astonishing feature of probability theory is 
that it is possible to have such a theory at all. Yet this theory is not only 
possible, it is also one of the most interesting and fruitful theories of pure 
and applied mathematics. 

In this chapter we lay the foundations for the subject by considering the 
experimental basis and meaning of probability and then formulating the 
mathematical description. This sets the stage for the more sophisticated 
theory which will be taken up in the succeeding chapters. 


1 EXPERIMENTAL BASIS FOR PROBABILITY 


In a world of uncertainty, we soon learn that some things are more uncertain 
than others. The sex of a child about to be born is usually uncertain, but very 
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likely an expectant mother will not have twins, and almost surely she will 
not be a mother of quadruplets. The driver of an automobile knows that most 
likely he will not be involved in an accident, but he is probably aware that an 
accident can occur, so he possibly will fasten his seat belt. If a pack of cards 
is well shuffled, it is uncertain whether the top card is a black card, it is un- 
likely that it is an ace of spades, and it is extremely unlikely that the top 5 
cards are the 10, J, Q, K, and ace, all in spades.! The possibilities in our world 
range from impossible, to unlikely, to maybe, to a good bet, to certainty. 

These varieties of uncertainty suggest that we measure how certain an 
event is. This can be done by performing an experiment many times and 
observing the results. Let us consider an example. 


1 Example 


Three dice? are tossed. What is the highest number that turns up? 

An appropriate experiment is to toss 3 dice and record the highest number 
that appears. This experiment was actually performed 100 times and the 
results appear in Table 1.1. The results suggest that 6 is more likely to occur 


1.1 Results of the 3-Dice Experiment 
Highest number showing 


Number of times this 
high number occurred 


as high number than 5. Similarly, 5 appears more likely than 4, etc. The fact 
that 2 appeared more often than 3 seems to indicate that 2 is more likely as 
high number than 3. But the table as a whole, and perhaps a healthy intuition, 
indicates otherwise, and we rather expect that 3 is more likely than 2. This 
particular experiment was repeated another 100 times with the results 
recorded in Table 1.2. 


1.2 Additional Results of the 3-Dice Experiment 


Highest number showing 6 
Number of times this 41 
high number occurred 


1 In this book we take for granted a familiarity with a standard pack of 52 cards. This pack 
contains 4 suits—hearts, diamonds, clubs, and spades; the first 2 are red and the last 2 are 
black. There are 13 cards in each suit: ace (A), king (K), queen (Q), jack (J), 10, 9, 8, 7, 6, 5, 4, 
3, and 2. 

2 A die is a cube whose 6 faces are engraved with the numbers :, 2, 3, 4, 5, and 6, respectively. 
The plural of “die” is “dice.” 
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When we compare Tables 1.1 and 1.2, we should be more surprised by the 
similarities than the obvious differences. It is as if some benevolent god were 
watching the experiments and arranged to make the results similar. Later 
we shall learn how to compute the probability that 6 is the highest number 
showing of the 3 dice that were thrown. This probability turns out to be .421, 
or 42.1 percent. According to probability theory this percentage will very 
likely be close® to the actual percentage of the time that 6 appears as high 
number, if the experiment is repeated a large number of times. In fact, this 
figure is seen to be strikingly close to the mark in Tables 1.1 and 1.2. Simi- 
larly, probabilities can be found for a 5 high, 4 high, etc. These probabilities 
and the percentage results of Tables 1.1 and 1.2 are listed in Table 1.3. In 
later chapters we shall learn how to compute these probabilities. 


1.3 Summary of the 3-Dice Experiment 
High number 


Percentage of occurrence 5 12 33 43 
(first 100 trials) 

Percentage of occurrence 26 
(second 100 trials) 


Probability (percent) | 3.2 17.1 28.2 
(theoretically derived) 


This example is typical of many probability experiments. Before proceed- 
ing to some other examples, it will be useful to state some of the features 
common to probability experiments. 

1. In contrast to many deterministic scientific experiments in which there 
can only be 1 outcome, there were several possible outcomes of this experi- 
ment. These may be conveniently labeled s,, s., 53, 54, 53, and sg. Here ss, 
for example, stands for “the highest number showing on the 3 dice is 5.” 
Similar definitions apply to s,, 52, etc. 

2. The experiment was repeated many times. Here, we can see why such 
gambling devices as dice, cards, etc., are useful tools in the experimental 
probabilist’s hands. With comparative ease it was possible to repeat our 
experiment many times. Clearly, if we wish to find the probability that Mrs. 
Jones’s next baby will be a male, it seems unreasonable (as well as im- 
possible) to try the experiment 100 times! When we call for experimental 
probabilities, it is required to repeat an experiment many times and to make 


3 To be clarified in Chapter 6. 
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sure that the same experiment is run. We must be sure to shake the dice, or 
to shuffle the cards well. In the case of Mrs. Jones, we prefer to ask the 
question, “What is the sex of a child born in Englewood Hospital?” We then 
apply these results to the unknown sex of Mrs. Jones’s unborn baby. Here 
hospital statistics furnish figures analogous to those in Table 1.1. This 
“experiment” is repeated many times during a year at the hospital. 

When an experiment is repeated many times, it is not the number of times 
each outcome occurs that will be of primary significance for us. It is the 
relative frequency or fraction of the time that each outcome occurs that will 
be most important. Hence we make the following definition. 


2 Definition 


Suppose that the outcomes of an experiment may be 5,, s.,..., 5,. Suppose 
this experiment is repeated N times and that the outcome s, occurs n, 
times, ..., 5, Occurs n, times. Then 7, is called the frequency of s, (in that 
particular run of N experiments), and the relative frequency f, of s, is defined 
by the formula 

_™ 


hy 


with similar definitions for the frequency and relative frequency of each of 
the other outcomes 5,,..., Sx. 


Remark. Clearly f,, fy,..., f; may vary from experiment to experiment. 
For example, in Table 1.1, k = 6, and our outcomes are S,, 59, 53, 54, Ss, and 
Ss. N = 14+64+5+4+12+33+ 43 = 100. Also, n, = 1, n, = 6, ng = 5, ny = 12, 
ns = 33, and n, = 43. Thus f, = n,/N = i = .01 = 1 percent, fA =n,/N = 
zoo = .06 = 6 percent, etc. The relative frequencies in Table 1.2 are different, 
and in any comparison of the two tables it would be necessary to use differ- 


ent notations. Thus we might use g,, 25,..., g, for relative frequencies, and 
M,, Mz, ..., Mz for the frequencies. 
3 Theorem 
Suppose 5s,, 5:,...,5, are the possible outcomes of an experiment. If the 
experiment is repeated several times and these outcomes occur with relative 
frequencies f,,...,J,, respectively, then we have 

0<f <1,0<f <1,...,0<f, <1 (1.1) 
and 

fAitht:::+th=1 (1.2) 


This theorem simply states that the percentages are not negative or over 
100 percent, and that they add up to 100 percent. The proof is as follows. 
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We know that s; occurs n, times (i= 1,..., k).4 Thus 
N=n,+ngt-:-- +g (1.3) 


We have 0 < n,, because an event cannot occur a negative number of times. 
Therefore, 
O<n, S<nztngt:::+tn=N 


Dividing by N, we have 


Nn; 
<< 
0<5 1 
or 
0Oxsf,<1 
Furthermore, 
_ — hye... 4M 
Fite thea taets +a 
_mtmt +m _N_, 
N N 


This proves the result. 

We can now give the statistical definition of probability. Suppose an ex- 
periment has the possible outcomes s,,...,5,. These outcomes are said to 
have the probabilities p,,...,D,, respectively, if, when the experiment is 
repeated N times and N is very large, the relative frequencies f,,...,/;, of 
the outcomes s,,..., 5, Will in all likelihood be very close to p,,..., Dx. 
Briefly, f, ~ pi,...,f¢ ~ Dy when N is large. (The symbol “~”’ is read “‘is 
nearly equal to.) This “definition” can be validly criticized on several 
points. It is vague, because the terms “‘very large” and “very close” are 
used. Also, it hedges a bit with the phrase “‘in all likelihood.’ Nevertheless, 
the intuitive idea is clear: If the experiment is repeated many times, the 
relative frequency f; of s; will be close to p;. Thus, despite the unpredictable 
outcome of an experiment, the relative frequency of an outcome can be 
approximately forecast if a large number of experiments is to be performed. 
The reader will note (cf. Table 1.3) that in the dice experiment with N = 100, 
the relative frequencies of the various outcomes never differed by more 
than 5.1 percent (.05 1) from the probabilities. 

The ability to predict, with high accuracy, the relative frequency of an 
occurrence that occurs at random is quite astonishing and is probably not 
fully accepted by the average person. Still, every year we are presented with 
predictions of perhaps 500 or so traffic fatalities for the New Year’s week- 
end. The outcomes, alas, are always depressingly near the predictions. 

We have arrived at a quantitative measure of uncertainty. It is therefore 


4 This is a brief way of stating: ““s, occurs n, times, s, occurs n, times, ... , 5, Occurs n, times.” 
The price we pay for brevity is the introduction of the new letter i. 
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natural to define an outcome s, to be more likely than the outcome s, if 
P, > Py». (Here p; is the statistical probability of s,;.) Similarly, we may define 
equally likely events s, and 's, as events with equal probability. We can say 
that s, is twice as likely to occur as Ss» if p, = 2p,. This means that if a large 
number of experiments is performed, the relative frequency of s, will be 
approximately twice the relative frequency of s,.. The range of uncertainty 
from impossible to certainty is now expressed by the inequality 0 < p, < 1. 
Here 0 represents impossibility, 1 certainty, and each number in this range 
represents a degree of uncertainty. (See Fig. 1.4.) 


p=! Certainty 
Virtually certain 


p=.75 3:1 in favor of this event 


p=.5 Could go either way 


Odds slightly against 


9: 1 against this event happening 


Very unlikely 
p=.l 


p=0 Impossible 


1.4 Quantitative Versus Qualitative Descriptions of the Likelihood of an Event 


We note here that the so-called “law of averages” is often misapplied, 
because absolute rather than relative frequencies are incorrectly used. If 
an outcome (say tossing a coin and landing heads) has probability .5, we do 
not say that if the experiment is repeated (say) 10,000 times it will very 
likely occur 5,000 times. For even if it occurred 5,063 times, the relative 
frequency is .5063, and this number is very near .5000, although the actual 
number of occurrences is 63 more than 5,000. 

We now give the results of a few other experiments, together with the 
relative frequencies of the various outcomes and the probabilities. 
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4 Example 


A Coin is tossed. Does it land heads or tails? 

The experiment was repeated 500 times with the results shown in Table 
1.5. In this example we have 2 outcomes, H (heads) and T (tails), and, 
correspondingly, 2 frequencies, n, = 269 and n, = 231. Thus N=n,+n,= 
500 (the total number of experiments). The relative frequency of heads is 
fn = M,|N = 269/500 = .538, and similarly f, = .462 (see Definition 2). Note 
that f, +f, = 1, asin Theorem 3. 


1.5  Coin-Tossing Experiment 


Frequency 
Relative frequency 


Probability 


5 Example 
A die is tossed until a 6 turns up. How many throws are required? 

Experimental results are tabulated in Table 1.6. Here N = 60. There are 
several points of interest here. First, there were infinitely many possible 
outcomes of this experiment. It was conceivable that the number of throws 
necessary before a 6 turned up would be 1, 2, 3, etc., indefinitely. (It was 
even conceivable that a 6 might never turn up.) Because we are mainly 
concerned in this book with experiments having finitely many outcomes, we 
merely lumped all outcomes that needed 13 or more throws into one out- 
come —“‘over 12.’ Thus we only had 13 outcomes 5), 52, ... , 542, 513, Where 
S13 was the “‘over 12” outcome. 

Another point to observe is that in this particular series of 60 experiments, 
the fluctuations appear to be rather pronounced. The probabilities indicate 
that s, is the most likely outcome, with s, next likely, etc. Yet we had 6 
occurrences of s,) and no occurrences of s 5. Also, s, occurred most often. 
These things happen, and when they do, there are at least 4 explanations: 


1. There were not enough trials. Thus N = 60 is not so large. 

2. A very unlikely event happened. (Loosely speaking, we expect 
occasional miracles.) 

3. The dice were not shaken enough to ensure that the tosses were com- 
pletely random. 

4. The results merely look unlikely, because the observer is not very 
sophisticated. 
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Surprisingly, item 4 is often the best explanation for an apparent “miracle.” 
Note, however, that even the other explanations do not negate the idea of 
statistical probability. We must be willing to imagine a very large series of 
experiments and we must often concede that an actual series of experiments 
gives merely a slight indication of what is meant by statistical probability. 


EXERCISES 


1. An experiment with possible outcomes A, B, or C is repeated several 
times with frequencies as follows: 


Outcomes A B C 
Frequencies | 119 | 203 | 278 


Compute the relative frequencies f,, fg, and fc. 


2. When 2 dice are tossed, the sum of the numbers that turn up can be 
any integer between 2 and 12 inclusive. This experiment was performed by 
the author’s assistant with the results indicated. Compute the relative 
frequencies and compare with the indicated probabilities. 


Sum on 2 dice 


Probabilities 


Frequencies 


3. (Births—male or female?) According to the Statistical Abstract of 
the United States, 1962, statistics for male (M) and female (F) births in the 
United States during the years 1945-1949 are as indicated in Table 1.7. 
Compute the year-by-year relative frequencies fy and f;. Note that in these 
tables N is quite large. What conclusions can you draw? 


1.7 Live Births (in Thousands), by Sex 


Females 
1945 1,391 
1946 1,657 


1947 1,857 
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Exercises 4 through 9 are essentially experiments to be performed by the 
reader. In a classroom situation, a large number of experiments can be per- 
formed if each student does relatively few experiments. The experimental 
results should be saved to compare with the probabilities that will be deter- 
mined later in the book. 


4. A pack of cards is well shuffled. Three cards are taken from the top 
of the deck. How many black cards are there among these 3 cards? 

a. State the possible outcomes. 
b. Perform the appropriate experiment 50 times and record your results. 
c. Compute the relative frequencies. [Hint: In this exercise there are 
clearly 4 outcomes sy (no blacks), s, (1 black), s, (2 blacks), and s, (3 
blacks). A very good way of keeping count is indicated in the following 
hypothetical, partially completed table: 


A stroke of the pencil indicates that an event has occurred; every fifth 
stroke is horizontal. 


5. Three dice are tossed. What is the highest number that turns up? 
a. State the possible outcomes. 
b. Perform the experiment 100 times and record your results. 
c. Compute the relative frequencies and compare with the probabilities 
in Table 1.3. 


6. Five dice are tossed. What is the middle number that turns up? (For 
example, if the dice thrown are, in increasing order, 1, 2, 2, 5, 6, the middle 
number is 2.) 

a. State the possible outcomes. 
b. Perform the experiment 100 times and record your results. 
c. Compute the relative frequencies. 


7. Ten coins are tossed. How many heads turn up? As in Exercises 4 
through 6, state the possible outcomes, perform the experiment 50 times, 
and compute the relative frequencies. 


8. The results of Exercise 7 may be used to find the relative frequency 
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with which a tossed coin lands heads. Explain how this may be done. Find 
this relative frequency using the experimental results obtained in Exercise 7. 


9. A coin is tossed until it lands tails. How many tosses will it take? As 
in Exercise 4, state the possible outcomes, perform the experiment 80 times, 
and compute the relative frequencies. 


Exercises 10 through 14 are discussion problems. There is not necessarily 
a correct answer. 


10. In the 3-dice experiment (Example 1), 1s there any reason you would 
expect a 6 high to be more likely than a 5 high? Similarly, why might you 
expect (before any experiments are performed) that a 5 high is more likely 
than a 4 high, etc.? 


11. In the coin-tossing experiment of Example 4, is there any reason to 
expect (before the experiment) that heads is as likely as, or more likely than, 
tails? 


12. In the waiting-for-6 experiment of Example 5, is there any reason to 
suppose that it is more likely that a 6 will turn up on toss 1 rather than on 
toss 2? 


13. A person tossed a coin 10,000 times. He claims that the first 5,001 
tosses were heads and the next 4,999 tosses were tails. When it is objected 
that this seems very unlikely, he claims that the relative frequency of heads 
is .5001, which is extremely close to the probability .5. Is the objection 
valid? Why? 


14. In Exercise 6 is there any reason to suppose that the middle number 1 
is less likely than 2? That 1 and 6 are equally likely? That 2 and 5 are equally 
likely? 


2 MATHEMATICAL FORMULATION OF PROBABILITY 


The injection of mathematics into the study ofa physical phenomenon usually 
has the effect of enormously aiding in the understanding of that phenomenon. 
Probably the most familiar example is geometry. From its crude, tentative 
beginnings involving the measurement of length, area, and volume, the bril- 
liant structure of Euclidean geometry was constructed. This structure has 
proved so successful that the physical applications are now more or less 
considered to be a rather trivial by-product of it. On the other hand, some 
topics have resisted a mathematical formulation. (For example, the mathe- 
matical study of esthetics has been attempted, but with no apparent success.) 
The test of any mathematical treatment of a physical phenomenon is its 
ability to enhance the understanding of the phenomenon. 
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What is a mathematical formulation? Basically, it reduces the subject to 
a few simple concepts, which are then studied with the help of definitions, 
theorems, and proofs. Hopefully, the mathematics then helps to develop 
one’s intuition about the subject in question, while the physical intuition 
helps to develop and motivate the mathematics. With these preliminary 
remarks, let us return to probability theory. 

In the examples of Section 1, each experiment had several possible out- 
comes S;,...,S,. Furthermore, each outcome s; had a certain probability p,, 
which was thought of as the long-range relative frequency of the outcome s,. 
Since the relative frequencies were all between 0 and 1 inclusive, and since 
they added up to 1 (Equations 1.1 and 1.2), it is reasonable to suppose that 
the same is true for the probabilities. In the mathematical formulation, the 
outcomes will be called elementary events. The set of all possible outcomes 
will be called the probability space S. In this formulation, we simply think of 
S as a finite set of elements s,, 5,,...,5,, Where the nature of these elements 
is irrelevant. 


6 Definition 


A probability space S is a finite set of elements 5,, 5.,..., 5;, Called elemen- 
tary events. Corresponding to each elementary event s; is a number p; called 
the probability of s;. The numbers p; must satisfy the conditions 


O<p,<1 (j=1,2,...,4) (1.4) 
and 
Pit pot: +++py= 1 (1.5) 


The terms sample point, simple event, or atomic event are sometimes used 
instead of “elementary event.” The term “probability space’ is used only 
when each elementary event is assumed to have a probability. If we wish to 
speak of the set S without regard to probabilities, we call § the sample space. 
Thus the term “probability space” implies that probabilities of elementary 
events are defined, whereas this is not so for the term “‘sample space.” 

In the general theory of probability, a sample space can be an infinite set 
and Definition 6 would be the definition of a finite probability space. How- 
ever, we do not define this more general concept, and for simplicity we use 
the term “probability space” instead of the more correct term. 

Note that nothing in Definition 6 is said about the numbers p; being “‘long- 
range relative frequencies,” but it will do no harm if the reader thinks of p, in 
this way. Similarly, there is no harm in geometry when one thinks of a point 
as a dot on a paper, despite the understanding that a point is an undefined 
term. 

We shall also use the notation p(s;) (read “‘probability of s,” or simply 
““p of s;’). Thus p(s;) = p;. This notation (called functional notation) makes 
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clear the dependence of the probability p; on the elementary event s;. We 
shall also let s designate a typical elementary event and p(s) its probability. 

Definition 6 is a purely formal one. It permits the construction of probabil- 
ity spaces with relative ease. For example, we may take S = {A, B, C}, with 
p(A) = .8, p(B) = .1, p(C) = .1, and we have constructed a probability space. 
(Note that the numbers .8, .1, and .1 satisfy Equations 1.4 and 1.5) This 
probability space may be put into tabular form as in Table 1.8. The fact that 
this example is so simple merely shows that the idea of a probability space is 
a simple one. Of course, this example is arbitrary and sterile, too. We are 
still in the elementary stage of the subject. 


1.8 Probability Space Illustrated 
s A B C 


p(s) | .8 1 1 


We may now summarize the correspondence thus far obtained between 
physical reality and its mathematical formulation. Physically, an experi- 
ment is performed with several possible outcomes. Each of these outcomes is 
thought to havea statistical probability. These probabilities satisfy Equations 
1.1 and 1.2. We abstract this situation into a mathematical formulation as 
follows. A set S, called a probability space, is given. Each element s of S is 
called an elementary event and determines a number p(s), called the proba- 
bility of s. These probabilities satisfy Equations 1.4 and 1.5. The correspon- 
dence between reality (the experimental situation) and the mathematical 
model (the probability space) is given in the following table: 


Experimental situation Mathematical formulation 


Possible results of an experiment <> Elementary events s of a probability 
Space § 

Possible outcome s <> Elementary event s 

Statistical probability of s <= p(s) 


In future sections we shall attempt to formulate mathematical concepts 
in such a way as to reflect experimental situations. In this way the program 
outlined in the second paragraph of this section can be carried out. 

It is considered respectable to mix the two languages. In the simple coin- 
tossing experiment (Example 4), the outcomes were heads or tails. A prob- 
ability space might be S = {H, T}, with p(H) = .5, p(T) = .5. It would be 
considered quite pedantic in everyday use to speak of “the probability of 
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the elementary event H.” We are permitted to speak instead of “the prob- 
ability of tossing a head,” and we shall occasionally make similar simplifica- 
tions throughout the text. The context usually makes it clear whether it is 
the statistical probability or the more abstract probability which is under 
consideration. 

Definition 6 does not teach us how to compute probabilities; it merely 
presents these probabilities. In a practical situation, where probabilities are 
unknown, it is necessary to have more information to find these probabilities. 
One of the simplest assumptions which leads to an immediate and explicit 
value for the probabilities is the assumption that the various sample points 
have equal probability. In the real world we would say that the various out- 
comes are equally likely. 


7 Definition 


A probability space S = {s,,..., 5,} is called a uniform probability space if 
the values p, = p(s;), P2 = P(S2),--+5 Pe = P(x) are all equal: p, = p, = 
eo = Pr. 

8 Theorem 

In a uniform probability space S consisting of k elementary events s,,..., 5;, 


the probability of any elementary event is 1/k: p(s;) = 1/k (i= 1,2,...,k). 
Proof. By hypothesis, S is a uniform probability space. Hence 


Pi = P2=°**=Pr=r?P 


where we have set each p; =p (the common value). By Equation 1.5 we 
have 
Pytsss+pp=1 


Hence 
pt-:::+tp=1 
kp=1 
_ 1 
P= 4 


This completes the proof. 

This theorem is the basis for the computation of various probabilities. For 
example, in the coin-tossing experiment, § = {H, T}. Thus k = 2 here, and 
p(H) = 3, p(T) = 2. Similarly, if we toss a die, we may take S = {1, 2, 3, 4, 5, 
6}. Hence k = 6 and p = %. Thus the probability of tossing a 4 is 4 = .167. If 
we choose | card at random from a shuffled deck, we may choose S to be the 
set of possible cards. Hence k = 52, and the probability of choosing (say) 
the 10 of hearts is J, = .0192. The same probability applies to each of the 
possible cards. 

Underlying these elementary computations is the basic assumption of 
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Theorem 8: The probability space is uniform. This is surely a reasonable 
assumption for cards. For example, why should anyone expect the draw of 
a 7 of hearts to be more (or less) likely than the draw of a 3 of clubs? The 
obvious assumption to make is that all cards are equally likely to be drawn. 
Similarly, if a die 1s thrown, it seems reasonable to assume that a 3 is just as 
likely to turn up as a 4, etc. On the other hand, the reader must beware of 
falling into the Aristotelian trap.° Nature cannot be forced. Maybe there is a 
bias against drawing an ace of spades! Certainly it seems likely that in the 
throw of a die, all faces are not equally likely. Indeed, the die is not perfectly 
symmetrical. (The basis of “‘loading”’ dice is to make this asymmetry more 
pronounced.) Similarly, a coin is not perfectly symmetrical. 

Despite these cautionary statements, we shall usually assume that for 
coins, cards, and dice, the appropriate probability space is uniform. (We 
then say that we are dealing with an ideal or fair coin, pack of cards, or die.) 
The justification is as follows. First, as far as the mathematics is concerned, 
it often simplifies the theory considerably. Second, the physical reality (based 
on experiments) 1s fairly close to this assumption. Finally, the knowledge of 
an ideal coin, die, etc., is obviously useful if we wish to study any coin, die, 
etc., to find out how far from the ideal it is. Thus our reasons are analogous 
to the ones that the Greek physicist-geometer might have used when he 
started out considering lines before curves. 


EXERCISES 


1. State whether each of the following are probability spaces. Give 
reasons. 


a. faye ieee 
p(s) | .1 1 21.3 | 41.5 


p(s) 3 3 2 


5 After Aristotle, who “proved” that nature acts in certain ways. Nature refused to comply. 
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2. Suppose that S = {A, B, C, D} is a probability space, with p(A) = .1, 
p(B) = .4, p(C) = .3. Find p(D). 


3. A uniform space consists of the elements #, !, *, }, and 1. Find p(’). 


4. A uniform probability space consists of all the letters of the English 
alphabet. Evaluate p(K ). 


5. Let S= {A,B,C}. Suppose that B is twice as likely as A and that B 
and C are equally likely. Find p(4 ), p(B), and p(C). 


6. Let S = {a, b, c,d}. Suppose that b is twice as likely as a, c is twice 
as likely as b, and dis twice as likely as c. Find p(a). 


7. Suppose an experiment has possible outcomes s,,..., 5, and that this 
experiment is repeated N times. Suppose these outcomes occur with relative 
frequencies f,,...,J;, respectively. 

a. Are f,,...,f; the statistical probabilities of s,,...,5,, respectively? 


b. If we define p(s;) = f,, is S a probability space? Explain. 


8. Suppose you tell someone to pick a number (i.e., an integer) from | to 
10 inclusive and to pick it so that you will not be able to guess the number. 
What number will he choose? It seems clear that we have a sample space 
consisting of the 10 numbers 1 through 10. Do you think this is a uniform 
probability space? Explain why. Devise an experiment and try it on 30 
people to check your conjecture. 


9. The hearts are removed from a pack of cards and a card is chosen at 
random from the remaining deck. Find p(3 of clubs). 


10. A card is to be drawn from a well-shuffled deck to determine the statis- 
tical probability of drawing a black card. One possible sample space is {black, 
red}. Is it also possible to use {hearts, diamonds, clubs, spades}? Explain. 
Can {black, red} be used as a sample space if it is desired to determine the 
statistical probability of drawing a spade? Explain. 


3 EVENTS 


The results of an experiment can give facts about the relative frequency of 
occurrences that are not considered elementary events or outcomes. For 
example, suppose an experiment is performed several times to determine 
the suit (hearts, diamonds, clubs, or spades) of a card drawn randomly out 
of a deck. It is natural to take as the sample space S = {He, Di, Cl, Sp}. Sup- 
pose the results of the experiment are summarized in Table 1.9. (Here we 


6 The term “at random’’ means that the obvious probability space is taken to be uniform. 
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1.9 Suit of a Card 


use n to designate frequency and f to denote relative frequency.) Then it 
would be a very simple matter to use the results of this experiment to find 
the relative frequency of obtaining a red card. In fact, the red cards are the 
hearts or the diamonds. Hence a red card occurred ny. + np; = 56+ 43 = 99 
times out of N = nye + np + Nc + Nsy = 56+ 43 + 53+ 48 = 200 times. Thus 
the relative frequency of a red card is 99/200 = .495. (As we shall soon see, 
this result could have been obtained directly by adding the relative fre- 
quencies f;;. and fj;.) Thus we may write red = {He, Di} and 


trea = Sue + Soi (1.6) 


We now generalize this example to arbitrary experiments. 


9 Definition 

Suppose a probability experiment has the possible outcomes s,,...,5,. An 
event A is any subset of these k outcomes: A = {t,,..., tp}, where the t,’s 
are different outcomes. We say that A occurs in an experiment if one of the 
outcomes in the subset A occurs. If the experiment is repeated N times, the 


frequency of A is the number of times that A occurs. The relative frequency 
of A [written f(A )] is defined by the formula 


f(A)=— (1.7) 


where 7 is the frequency of A. 


10 Theorem 
Let A = {t,,...,t)} be an event. Suppose that the probability experiment 
is repeated N times and that ¢, occurs with relative frequency f(t,), etc. Then 


f(A) = f(t) +f (t.) +° ; ‘+f (tp) (1.8) 


In brief, the relative frequency of A is the sum of the relative frequencies of 
the outcomes that constitute A. 

Proof. Suppose that t; occurs m; times (i= 1,2,...,p). Then, by Defini- 
tion 2, 


f= (= 1,2,...,D) 
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But if n is the frequency of A, clearly n is the sum of the frequencies of the 
t;s. Thus 

n=m+'''+My, 
Dividing by N we obtain 


or 
F(A) =f) +++ +f (tp) 
This is the result. 
Thus in Table 1.9 the relative frequency of red can be computed using 
Equation 1.6, which is a special case of Equation 1.8. 


11 Example 
Using Table 1.6, find the relative frequency of the event A: “‘a 6 turns up on 
or before the fourth toss.” 

Here s, was the event “a 6 turns up on the first toss,” s, the event “a 6 
turns up first on the second toss,” etc. Thus 


A = {S1, So, S35 S4} 
and the above theorem permits the computation 


f(A) =f (s1) +f (s2) +f (83) +f (Sa) 
= 150+ 117+ .117+ .200 = .584 


using the results of that table. We may expect a small error because the 
figures are not exact but rounded off to 3 decimal places. The actual relative 
frequency is 
9+7+7+12 _ 35 
f(A) =——)-——- = 6p = 383 
to 3 decimal places. 

It seems reasonable to compare the relative frequency of A with the 
probability of A. Strictly speaking we have not as yet defined the probability 
of an event, but in view of Theorem 10 it is reasonable to add the probabili- 
ties of the sample points constituting that event. Using the figures of Table 
1.6 we obtain 

p(A) = .167+ .139+ .116+ .096 = .518 


where the last figure is in doubt because of roundoff error. 
We now formally define the probability of an event. 


12 Definition 
Let S = {s,,...,5,} be a probability space. An event A is any subset 
{t,,...,tp} of S. The probability of A, written p(A), is defined by the formula 


p(A) = p(t) +--+ +p(tp) (1.9) 
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Definitions 9 and 12 illustrate an important mode of procedure: We try 
to define a mathematical concept in terms of the corresponding physical 
phenomenon. Thus Equation 1.9 was obviously motivated by Equation 1.8. 
For if relative frequencies of sample points approximate probabilities of 
elementary events, as we originally intended, then (comparing Equations 
1.8 and 1.9) the relative frequency of an event will approximate its probability. 

Equation 1.9 may be written in the somewhat forbidding form 


p(A) = 2 p(s) (1.10) 


[Read: p(A) equals the sum of p(s) for s in A.] Here & is the symbol uni- 
versally used to designate a sum. The term p(s) following the > sign is the 
general expression for a term to be added. The statement “s € A” under 
the > sign is a restriction on the terms p(s) to be added. Thus we add only 
those terms p(s), where s is in the set A. 

Definition 12 takes on an interesting and useful form when S is a uniform 
probability space. In this case, the probability of a sample point is known 
(Theorem 8), and therefore the probability of an event can be explicitly 
computed. The result is given below in Theorem 14, after we introduce an 
important notation. 


13 Notation 
If A is any finite set, then the number of elements of A is denoted n(A). 


14 Theorem 
Let S be a uniform probability space, and let A be an event of S. Then 
_ nA) 
P(A) n(S) (1.11) 


Proof. Suppose S = {5,, 5:,...,5,}, while A = {t,, t,..., tp}. By Defini- 
tion 12, 
P(A) = p(t) +++ + +p(ty) 


By Theorem 8, p(t;) = 1/k for i= 1,2,...,p. Thus 


p(A) ait + (p summands ) 
_P 
k 
But n(A) = pand n(S) = k, by definition. Thus 
_ nA) 
p(A) n(S) 


and the proof is complete. 
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Historically, Theorem 14 was taken to be the mathematical definition 
of probability. The classical definition was phrased as follows: If there are 
x possible outcomes, all equally likely, and if an event can occur in any one 
of y ways, then the probability of this event occurring is y/x. It is seen that 
this is entirely equivalent to Theorem 14. 

Theorem 14 is a device with which many probabilities can be computed. 
All that is necessary to compute the probability of an event is that the 
probability space be uniform. However, it is first necessary to find n(A) 
and n(S). This procedure is of course called counting. It is by no means an 
easy matter if S is a fairly large set. For example, suppose we attempt to 
find the probability that a word, chosen at random from a standard diction- 
ary, has 5 letters. We might interpret this problem as follows: S = the set of 
all words in the dictionary under consideration, made into a uniform prob- 
ability space. (This is our interpretation of the word “random.’’) A = the set 
of 5-letter words in that dictionary. To compute p(A) we use Equation 1.11 
to find 
n(A) 

n(S) 


p(A) = 


so it is merely necessary to count the number of 5-letter words, and the 
number of words, and finally to divide the former number by the latter 
one—a tedious procedure! On the other hand, some counting is fairly 
routine. For example, if 1 card is chosen from a deck of cards, we may easily 
find the probability that it is a picture card (jack, queen, or king). Here § = 
set of cards, and hence n(S) = 52. The event A = set of picture cards. Since 
there are 4 jacks, 4 queens, and 4 kings, we have n(A) = 12. Hence (assum- 
ing a uniform space), the required probability is 


In everyday language, there are 12 chances out of 52, or 3 out of 13, or about 
23 out of 100 of choosing a picture card. 

In an experimental situation the choice of a sample space is somewhat 
arbitrary. In the above situation, it would have been valid to choose as 
the sample space the various ranks: § = {ace, 2, 3, 4, 5, 6, 7, 8, 9, 10, J, Q, K}. 
The event A would be {J, Q, K}. Since it is ‘reasonable’ to suppose § 
uniform, we would have p(A ) = 7 directly. On the other hand, we might have 
chosen S to be the simpler sample space: S = {picture, no picture}. How- 
ever, it is “unreasonable” to assume S to be uniform, and we cannot apply 
the simple formula 1.11. 

We also point out that our definition of an event is broad, because it 
includes every possible subset of S. In particular, it is understood to include 
the empty set ® (a convenient set that includes no elements at all). @ is also 
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called the impossible event. In this case we interpret Definition 12 (Equation 
1.9) to include the equation 
p(®) =0 (1.12) 


as a special case. This definition is certainly a reasonable one from the 
point of view of relative frequency. Indeed, the impossible will not happen, 
and its relative frequency is 0. Another event of passing interest is the 
event § itself. S is also called the certain event, because S will clearly occur. 
We have 

p(S)=1 (1.13) 


This is also seen to be true from a relative-frequency point of view. Mathe- 
matically, the reader should convince himself that this is a consequence of 
Definition 12 and Equation 1.5. 

Another special case is an event consisting of only one elementary event: 
A = {s,;}. Equation 1.9 implies, in this case, that p(A) = p(s,;). Although 
there is a logical distinction between an element and a set that contains only 
that element, usage and notation blur that distinction with very little, if any, 
confusion. It makes little difference whether we regard the drawing of an 
ace of spades as a Sample point or as an event. The probabilities are the same. 

Finally, we make some remarks on usage. In the real world, or even the 
mathematical one, we are seldom presented with the simple problem “Find 
the probability of an event A.” Rather, it is customary and convenient to 
use some circumlocution. If a set A is given, we often ask: What is the 
probability that a sample point s is in A? This means: What is the value of 
p(A)? Similarly, in the above illustration, the question “What is the prob- 
ability that the chosen card is a picture card?” was answered by first 
defining the set A of picture cards and then finding p(A). In the same way we 
find the “probability of tossing an even number on 1 die” by first defining 
the uniform probability space S = {1, 2, 3, 4, 5, 6} and the set A of even 
numbers (A = {2, 4, 6}) and then finding p(A) = n(A)/n(S) = 3 =4. 


EXERCISES 


1. Let S={X, Y,Z}. List all the possible events. (Do not neglect the 
empty set.) 


2. Let S be the probability space whose table is as follows: 
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a. Find the probability that s is a vowel. What is the associated event A? 
b. What is the probability that s is a, b, or c? What is the associated 
event A? 


3. Referring to Table 1.3 find the probability that when 3 dice are 
thrown, the highest number showing is even. What is the associated event? 
Compare your figure with the relative frequency of even numbers in each 
series of 100 experiments. 


4. Referring to Table 1.6 find the probability that 9 or more tosses of a 
die are required before a 6 turns up. Give the associated event. Compare 
with the relative frequency in that table. 


5. Using the results in the table of Exercise 2, Section 1, find the prob- 
ability that when 2 dice are thrown, the sum of the numbers turning up is 
7 or 11. Give the associated event. Compare with the relative frequency as 
indicated in the table. Also, find the probability of tossing a sum that is 
either 6, 7, or 8. 


6. A card is drawn at random from a standard deck of cards. What is 
the probability that the card is an ace and/or a spade? List the sample points 
in the associated event. 


7. Anumber (i.e., an integer) is chosen at random from | to 100 inclusive. 
What is the probability that the digit 9 appears in that number? List the 
sample points in the associated event. 


8. A license-plate number begins with either a letter or a number. 
Assuming that the sample space of letters and numbers is uniform and that 
the number 0 and letter O are distinguishable from each other, what is the 
probability that a license plate begins with a number? 


9. An integer is chosen at random between | and 21 inclusive. Find 
the probability that 
a. itis less than 10. b. it is divisible by 3. 
c. itis divisible by 5S. d. itis divisible by 3 but not by 6. 
e. itis divisible by 7 but not by 3. 


10. Suppose S is the set of integers from | through 100. Let A = the even 
integers of §, B = the integers divisible by 7 in §, C = the integers less than 
or equal to 10 in S, and D = the perfect squares in S. Find 

a. n(S), n(A), n(B), n(C), n(D). 
b. p(S), p(A), p(B), p(C), p(D). 
Assume that S is a uniform probability space. 


11. Do Exercise 9 if the integer was chosen from | to 100. 


12. Do Exercise 9 if the integer was chosen from | to 1,000. 
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13. Let S = {x, y, x, u, v} be asample space as follows: 


a. LetA = {x, y, u}. Find n(A). Find p(A). 
b. Compute % se {u,v} p(s). 
c. Write p(x) + p(y) + p(v) using the “>” notation. 


14. A certain radio station only plays the “top 25” hit tunes, and they are 
played at random. However, half of the air time is devoted to random 
announcer chatter (humor, advertising, station identification, etc.). Set up 
a sample space for what you will hear when you turn on the radio. What is 
the probability that you will hear a song in the “top 10”? (Assume, for 
simplicity, that each tune lasts 3 minutes.) 


4 SOME EXAMPLES 


We now consider some examples that will illustrate some of the ideas of 
the preceding sections. 


15 Example 
When 2 dice are thrown, what is the probability that the sum of the numbers 
thrown is 5? 

The natural sample space to choose is the possible ways the 2 dice can 
fall. This is best illustrated in Fig. 1.10. We call one die A and the other B; 


1.10 Sample Space for 2 Dice 
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the outcomes for A (1, 2, 3, 4, 5, or 6) are put in the first column and those 
for B are put in the first row. The entries in the figure represent all the 
possible outcomes of the experiment. Thus the entry (x, y) in row x and 
column y signifies that A turns up x and B turns up y. The elementary events 
consist of all (x,y), where x= 1,2,...,6 and y=1,2,...,6. Figure 1.10 
only lists a few typical entries. 

If we imagine that die A is readily distinguished from die B (say A is 
green, B is red), then we can see that the sample points (2, 3) and (3, 2) are 
distinguishable outcomes. Furthermore, it seems reasonable to suppose that 
the 36 outcomes are equally likely. Indeed, why (for example) should (4, 6) 
be any more, or less, likely than (2, 3)? This is no proof, of course, because 
the ultimate proof is experimental.’ Thus we shall assume that the probability 
space 1S uniform. 

If the dice are similar in appearance, then the experimenter will not be 
able to distinguish (say) (1, 3) from (3, 1). Nevertheless, the dice are distinct 
and we may assume the sample space to be the uniform space of 36 elements 
as in Fig. 1.10. 

We can now easily answer the question proposed in Example 15. The 
event “sum = 5” consists of the elementary events {(1, 4), (2,3), (3, 2), 
(4, 1)}. There are 4 such points. Also (see Fig. 1.10), there are 36 = 6 X6 
elementary events in S. Using Theorem 14 we have 


p(sumis5)=4= 5=.111 


(Compare with Exercise 2 of Section 1, where this probability was stated 
without proof.) 

We can also easily find the probability that the sum on the dice is any of 
the possibilities 2,3,...,12. Referring to Fig. 1.11, where the elementary 


A, = {(1, D} 
As = {(1, 2), (2, D} 
A, = {(1, 3), (2, 2), (3, 1)} 


1.11 Two-Dice Sample Space, with Events Corresponding to the Various Sums 


7 In Section 5 of Chapter 3, however, we shall consider this assumption in more detail. 
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events are illustrated as dots, the associated events Ay, A3,..., Aj. are seen 
to contain the number of elementary events as indicated in Table 1.12. This 
table also gives the probabilities reduced to lowest terms, and the probabili- 
ties in percent. 


1.12 Sum on 2 Dice 
Sum on 2 dice 6 7 8 


Number of 
sample points 5 5 


ow OBES 
Probability 

(percent) 2.8 | 5.61 8.3 | 11.1 | 13.9 | 16.7 | 13.9 | 11.1 | 8.3 | 5.6 | 2.8 
16 Example 


Five coins are tossed. What is the probability that exactly 2 heads turn up? 

If the coins are called A, B, C, D, and E the natural sample space consists 
of all the distinguishable occurrences of heads and tails. Letting HHTHH 
denote the occurrence of H, H, T, H, and H on A, B, C, D, and E, respec- 
tively, etc., we can easily construct the sample space of all possible combina- 
tions of heads and tails. In alphabetical order (reading downward from the 
first column) the sample space S 1s given in Fig. 1.13. 


1.13 Sample Space for 5 Tossed Coins 


HHHHH HTHHH THHHH TTHHH 
HHHHT HTHHT THHHT / TTHHT 
HHHTH HTHTH THHTH / TTHTH 
HHHTT V/V HTHTT / THHTT TTHTT 
HHTHH HTTHH THTHH / TTTHH 
HHTHT V/V HTTHT / THTHT TITHT 
HHTTH / HTTTH / THTTH TITTH 
/ HHTTT HTTTT THTTT TITTT 


By actual count, S contains 32 elements: n(S) = 32. If we let A, = 


of elementary events with exactly 2 heads occurring, we find n(A,) = 10 by 
counting. (The elements of A, are checked in Fig. 1.13.) Thus p(A,) = n(A,)/ 
n(S) = 45 = .312. Here we are assuming that S is a uniform space —i.e., that 
each of the outcomes in Fig. 1.13 is equally likely. This seems reasonable 
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enough, although a “proof” must be obtained empirically. As with the dice 
experiment of Example 15, we shall analyze this assumption in greater 
detail in a later section. 

By further inspection of Fig. 1.13, we can find the probabilities of no 
heads, 1 head, etc. These are listed in Table 1.14. 


1.14 Number of Heads Among 5 Tossed Coins 
Number of heads 


Number of sample points 
Probability 


Probability (percent) 


It is worth noting that if we were to run an experiment to determine the 
relative frequencies of no heads, | head, etc., we should probably choose 
as our sample space the 6 outcomes 5p, 5,,...,5;. The advantage, however, 
of using the 32 outcomes of Fig. 1.13 over these 6 outcomes is apparent in 
our theoretical analysis. We chose the 32 occurrences of Fig. 1.13 because 
we had reason to believe that this was a uniform space, and we were thus 
able to compute probabilities. When the larger uniform space was used, our 
‘‘outcome’”’ s, was reinterpreted as an “event” A,. 


17 Example 
A player has 2 coins and plays the following game. He tosses 1 coin. Ifa 
head occurs, he wins. If not, he tosses the other coin, and if a head occurs 
then, he also wins. But if not, he then loses the game. What is his probability 
of winning? 

A natural sample space would be all the outcomes H, TH, TT. Of these, 
the first two constitute the event of winning. Hence we might say that 


p(winning) = 3 


But we may also reason that if the player wins on his first try, it does him 
no harm to toss the second coin just to see what happens. The sample space 
is then HH, HT, TH, TT. In this case there are three ways of winning and 
four possible outcomes. Hence we might say that 


p(winning) = 3 


Which is right? We choose the latter result because we assume that the 
4-point space is uniform (all outcomes are equally likely), and hence we may 
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correctly use Theorem 14, which permits us to compute probabilities by 
counting. The first space is not uniform. Indeed, most people would bet on 
H rather than on TT. A good empirical way of checking this result is to 
play this game many times and find the relative frequency of winning. 

Note that we obtained a uniform space by means of an artifice. We thought 
of H as 2 outcomes HH and HT. 

This example is interesting historically, because the famous and respected 
mathematician D’Alembert claimed that the probability was # and stuck to 
his guns for a long while. If the reader falls into the trap of assuming that 
any probability space is uniform, he now has the knowledge that it happened 
before. 


18 Example 

Of the 10 families living on a certain street, 7 are opposed to air pollution 
and 3 are either in favor or undecided. A polltaker, not knowing where people 
stand on this issue, chooses 2 families at random. What 1s the probability 
that both families he chose are opposed to air pollution? What is the prob- 
ability that neither family opposes air pollution? 

If we label the families that oppose pollution A, B, C, D, E, F, G and the 
families that are in favor or undecided H, /, J, then we have the sample 
space of Fig. 1.15. Here the situation is similar to the 2-dice space (Fig. 
1.10) except that the diagonal (A,A), (B, B), etc., is excluded, because the 
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polltaker knows better than to bother a family twice. Here, for example, 
(F,A) is in row F, column A, and represents interviewing F first and then 
A. There are 10 X 10 = 100 squares, of which the main diagonal (upper left 
to lower right) has been excluded. Hence we have a total of 10?—10 = 90 
sample points. The lightly shaded squares represent the event “OPPOSED” 
that the 2 families interviewed were opposed to air pollution. We readily see 
that n (OPPOSED) = 7?—7=7(7—1) = 42. Therefore, the probability that 
both families will be opposed to air pollution is 


p (OPPOSED) = & = 46.7%) 


The probability that both families will be in favor or undecided is, using the 
squares with dots, 
p (POLLUTE) = 5 (=6.7%) 


19 Example 


From our point of view, an urn is a bowl, box, or receptacle containing 
marbles, coins, pieces of paper, or other objects that feel the same to a blind- 
folded person but have certain identifiable characteristics. Using an urn, we 
obtain a very realistic physical model of a uniform probability space. For 
example, if 15 billiard balls (numbered 1 through 15) are put in a box and 
shaken, then when we choose one of these balls without looking, we are 
reasonably certain that each ball is equally likely to be chosen. The same 
principle applies to playing bingo or to a drawing in a lottery. A good way for 
the polltaker of Example 18 to decide which 2 families to choose is to put the 
names of the 10 families on pieces of paper, mix thoroughly, and choose 2 
slips of paper. 

Using urns, we can even create nonuniform probability spaces. For 
example, if an urn contains 7 green marbles, 5 red marbles, and 1 blue 
marble, we may label the marbles conveniently as follows: 


815 825 835 845 85> 865 875 "15 199 "35 "45155 b, 


Then the events G (green), R (red), and B (blue) are defined in the natural 
manner: G = {g,, 2o,.-.,87}, R= {r1,!e,...,%s}, B= {b,}. The probabili- 
ties for these events are governed by Theorem 14. Hence p(G) = 4, p(R) = 
<3, P(B) = +, and we may regard this model as a 3-point, nonuniform model. 
In much the same way, any finite probability space may be realized by an 
urn model, as long as the probabilities are rational numbers (i.e., fractions). 
Since irrationals may be closely approximated by rationals, we can certainly 
approximate any finite sample space by an urn model. This is a useful idea, 
for it shows that it is not too unrealistic to consider uniform probability 
spaces —all finite sample spaces may be regarded as approximations of these. 
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EXERCISES 


1. Using the (uniform) probability space of Fig. 1.10, find the prob- 
ability that, when 2 dice are thrown, at least one 4 shows up. Indicate the 
associated event on a diagram. 


2. As in Exercise 1, find the probability that when 2 dice are thrown, at 
least one | or 6 shows up. Draw a diagram of this event. 


3. As in Exercise I, find the probability that the difference between the 
high and low number thrown is 2 or more. 


4. In analogy with the 3-dice experiment (Example 1 of Section1!), find 
the probability that, when 2 dice are tossed, the highest number thrown is 6. 
Similarly, find the probability that the high number is, 5, 4,3, 2, and 1. Sketch 
the associated events in a single diagram. 


5. Using the (uniform) probability space of Fig. 1.13, find the probability 
that, when 5 coins are tossed consecutively, 
a. the first coin lands heads. 
b. the first 2 coins land heads. 


6. As in Exercise 5, find the probability that when 5 coins are tossed 
consecutively, a run of 3 or more heads in a row occurs. List the sample 
points of the associated event. 


7. Asin Exercise 5, find the probability that when 5 coins are tossed con- 
secutively, the sequence HTH occurs somewhere during the run. 


8. As in Exercise 7, find the probability that either HTH or THT occurs 
somewhere during the run. 


9. Three coins are tossed. List the sample points. Find the probability of 
a. noheads. b. | head.c. 2 heads. d. 3 heads. 


10. Was D’Alembert originally correct? Play the game of Example 17 
one hundred times and count the number of wins (W) and losses (L). (Do 
not use the artifice of the unneeded toss.) Compute the relative frequency, 
and compare with .667 (incorrect) and .750 (correct). Suppose you decide 
that a relative frequency of .75 or more verifies the text answer of .750, and 
a relative frequency of .67 or less verifies the (allegedly incorrect) answer. 
Which answer, if any, have you verified?® 


11. Alex, Bill, Carl, Dave, Emil, and Fred are members of a club. They 
8 We have probabilities within probabilities here. Assuming that .75 is the correct probability, 


we Shall later learn how to find the probability that ina run of 100 experiments, 75 or more wins 
occur, and similarly that 67 or less occurs. The same probabilities can be computed on the 


30 ELEMENTARY PROBABILITY THEORY 


decide to choose the president and vice president at random. (They use an 

urn, of course.) Using the technique of Example 18, find the probability that 
a. Carl is an officer. 

Alex and Bill are not officers. 

either Dave or Emil is president. 

Fred is president and/or Bill is vice president. 

Dave resigns from the club. (He will, 1f Bill becomes president.) 


i el 


12. An urn contains 6 white balls and 1 black ball. Two balls are drawn at 
random. What is the probability that both are white? What is the probability 
that the black ball is chosen? 


13. A gambling house offers the following game for the amusement of its 
customers. Two dice are tossed. If two 6’s turn up the customer wins 
$10.00. If only one 6 turns up, the customer wins $1.00. If no 6’s turn up, 
he loses $1.00. Find the respective probabilities of winning $10.00, winning 
$1.00, and losing $1.00. 


14, Gary and Hank are evenly matched rummy players. They engage ina 
championship match in which the best 3 out of 5 wins the championship. 
Once either player has won 3 games, the tournament ends and that player is 
declared the winner. Set up a probability space. Be sure you state the 
probabilities for each elementary event. 


5 SOME GENERALIZATIONS 
We have thus far set the stage for a theoretical study of probability using the 


notion of a probability space as the unifying idea behind random events. 
Although the motivating force for the definition was the idea of statistical 


assumption that the probability of a win is 3. This is summarized in the following table: 


Probability of Probability of Probability of 
75 or more wins 67 or less wins no determination 


Thus, even assuming that p = 3, about 4 percent of all people doing this problem will find 67 
or less wins and will decide p = 3. Note that 41 percent will come to no conclusion. If 1,000 
experiments were run, the figures in this table could be sharpened. The above method of 
decision is not the one that a statistician would choose. But regardless of the statistician’s 
method, he will always have a clause to the effect that with a small probability, the wrong 
decision will be made. 
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probability, Definition 6 is so general that it can be used to describe a variety 
of situations. Let us now consider some of these. 


Statistical Results as Sample Space. \f an experiment has outcomes 
Si, Sg,---, S, Which occur with frequencies n,, n.,..., N,, respectively, 
Definition 2 gives the definition of the relative frequencies f,, ff,...,; of 
these outcomes. These relative frequencies satisfy Equations 1.1 and 1.2, 
which are precisely what probabilities are required to satisfy (Equations 1.4 
and 1.5). Thus the outcomes s,,..., 5, with relative frequencies f,,...,f, 
form a probability space, which we may call the statistical probability space, 
determined by the results of the experiment. 

It is not even necessary for the outcomes to be the result of a series of 
random experiments. Any table giving the number of occurrences of various 
events can be interpreted in this manner. For example, consider the following 
(imaginary) information about the students in a certain college: 


Sleeping Habits of Students 
Sleep over 10 hours per day 


Sleep over 8 hours, less than 10 


Sleep 8 or less hours per day 


Never sleep 


Clearly, this may be regarded as a 4-element probability space. The 
numbers .30, .42, .23, and .05 are called relative frequencies rather than 
probabilities in order not to mislead the unwary. 


Statistical Probability as Sample Space. This is the application originally 
intended for a sample space to describe. If an experiment has possible out- 
COMES 5S;,...,5,, We take as a physical fact that there are numbers p,,..., Dp; 
satisfying Equations 1.4 and 1.5 which are approximately the relative fre- 
quencies of s,,..., 5, if the experiment is repeated a large number of times. 
We may say that the probability space postulated in this way is the limit of 
statistical sample spaces as in the case above. 


Finite Sets as Sample Space. If S = {5s,,...,5,} is any finite set with k 
elements, we may make S into a uniform probability space by defining 
p(s;) = 1/k for i= 1,...,k. Clearly Equations 1.4 and 1.5 hold. In this case 
Theorem 14 holds. Recalling that if A is any set, n(A) is the number of 
elements in A and that n(S) = k, we see by Theorem 14 that 


p(A) = nA) (1.14) 
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or n(A) = kp(A) (1.15) 


By making any finite set into a uniform probability space, we may use 
Equations 1.14 and 1.15 to translate statements about numbers of elements 
into statements about probabilities, and conversely. For example, the state- 
ment “There are 5 even numbers among the first 10 positive integers,” 
concerns § = {1,2,...,10}, A = {2,4,6,8, 10}, k=10, n(A) =5S, and 
therefore p(A) = 3, =2. We say interpret this statement as, ‘“The probability 
of choosing an even number among the first 10 positive integers is 3.” It is 
not necessary to regard this latter statement from the point of view of running 
a large number of experiments with urns. Rather, by Equation 1.15, it can 
be interpreted solely in terms of the relative number of even numbers among 
the numbers of S. However, the probability statement gives us information 
about, and perhaps a feeling for, the scarcity, or density, of even numbers. 

We can give a simple example of each of these three kinds of applications 
by considering the game of Example 17. Suppose this game is attempted 20 
times and is won 16 times (relative frequency 32 = .80) and lost 4 times 
(relative frequency .20). We can apply the 3 notions above to the same set of 
outcomes as follows: 


A. Statistical relative frequencies 


B. Statistical probabilities 


C. Uniform probabilities 


The 3 sets of figures are interpreted as follows: Row A gives the relative 
frequencies of what actually happened in the particular run of 20 experi- 
ments. Row B gives the theoretical probabilities. In a very large run of 
experiments, we may expect the relative frequencies to be near these figures. 
Row C merely implies that there are 2 outcomes, each outcome representing 
z of the total number of outcomes. In actual practice it will always be clear 
which application is being used. In Section 4 we used C (finite sets as 
probability space) and thought of the results as applying to B (statistical 
probability as probability space). 

An important generalization of our notion of a sample space is that of an 
infinite discrete sample space. Definition 6 postulated finitely many (k) 
outcomes. But in many natural situations infinitely many outcomes are con- 
ceivable. This was seen in Example 5, in which a die was tossed until a 6 
turned up. The natural outcomes were 51, 52, 53,... indefinitely, because we 
could not definitely exclude any possibility. We should also include s.,, the 
case where no 6 ever turns up. In this example we could be relatively sure 
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that, for example, a 6 would turn up before 1,000 throws. Yet an infinite 
sample space seemed to be called for. If we do use this infinite space, we can 
easily interpret the event “6 takes 13 or more tosses before arriving”’ as the 
event A = {515, S14). --5Sa}- Here A is an infinite set. Even so, its probability 
is fairly small. 

This is an example of an infinite discrete sample space. Its sample points, 
while infinite in number, can be enumerated s,, s,,....° There is one sample 
point for each positive integer. The general definition is a rather straight- 
forward generalization of Definition 6. 


20 Definition 


An infinite discrete sample space S is a set of points s,, s,,... (one for each 
positive integer) called sample points. Corresponding to each sample point 
s; 1S a number p; called the probability of s;. The numbers p; must satisfy the 
conditions 

0O<p,<1 (i= 1,2,3,...) (1.16) 


Pit Pate = 2% p= (1.17) 


Here Equation 1.17 is an infinite series. The meaning of this equation is 
that if enough terms are taken in this sum, the finite sum will be as close to 1 
as we want it to be. Thus we can make %”_, p; greater than .999 (and less 
than or equal to 1.000) by choosing n large enough. Similarly, we can 
exceed .9999, etc. Therefore, from a probability point of view, the entire 
sample space is practically concentrated in finitely many sample points. 

We may also look upon Equation 1.17 1n the following way. If probabilities 
are desired to 4 decimal places, we need only choose finitely many sample 
points, Say S,,52,.-.,S499, and the probability of the event {s,,..., So}, 
which is p; +: **+Pyjo9, Will equal 1 to 4 decimal places. The probability of 
the event {510), Sigg,.--} Will be smaller than .00005 and will appear in our 
table of probabilities as .0000. If more decimal places are called for, we will 
need more elementary events, but the principle is the same. 

In a later section we shall learn how to compute the probability p, that n 
tosses for a coin are required before a head occurs. The formula is p, = 1/2”. 
Thus p(H) = 3, p(TH) = 4, p(TTH) =}, etc. A table (to 2 decimal places) is 
as follows: 


Numvber of tosses 


Probability 


9 In the dice experiment, the addition of one extra sample point s, did not “increase the 
number of sample points.” For example, we may renumber 5.., 51, 59,... aS t,, ty, t3,.... 
(t, = Sx, te = $1, etc.) 
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Thus an infinite sample space is reduced to one having 8 sample points. More 
decimal places will require more sample points (and give better accuracy). 

Sample spaces even more complicated than discrete ones are possible, 
although we do not define them in this text. 


EXERCISES 


1. New York City is divided into 5 boroughs. The population (1960) 
census) was as follows: 


Population of New York City (1960 Census), by Boroughs 


Borough Population (in thousands) 
Manhattan 1,698 
Bronx 1,425 
Brooklyn 2,627 
Queens 1,810 
Richmond 222 


Use this table to make a probability space of the boroughs. 


2. The following is a summary of a school budget for a city: 


A School Budget 
Account Percentage of total 

Salaries 74.44 
Debt service 5.35 
Instructional supplies 5.44 
Maintenance 2.26 
Capital outlay 2.24 
Transportation 1.91 
Operational expense 1.82 
Contingency fund 1.78 
Other 4.76 

Total 100.00 


Is this a probability space? Does this mean that the money is spent at 
random? Explain. 
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3. The probability that an integer between | and 100 inclusive is a prime 
number ts .25. Restate this fact without using the language of probability. 


4. It is desired to choose a positive integer at random so that each positive 
integer is as likely to be chosen as another. Show that this cannot be done. 
In brief, show that there is no such thing as a uniform, infinite, discrete 
sample space. 


CHAPTER 2 COUNTING 


INTRODUCTION 


Many important problems in probability theory involve sets with a rather 
large number of elements. Therefore, to compute probabilities it is necessary 
to learn how to count efficiently. The reader already knows how to do this 
in many cases. If a room is 12 by 13 ft, and if a square tile is 1 by 1 ft, then it 
is not necessary to count, one by one, how many squares are needed to tile 
the room. The answer, of course, is 12 X 13 = 156 tiles. In this chapter we 
shall learn several short cuts of this type and apply them to probability 
problems. 

The proper framework of counting theory is within the theory of sets. 
Indeed, counting is a way of assigning to any set A the number n(A) of 
elements in this set. We count things. Hence sets—collection of things — are 
needed to understand counting. 


1 PRODUCT SETS 


Product sets arise when the objects in a given set can be determined by two 
characteristics. A familiar example is a pack of cards. A card is determined 
by the suit (hearts, diamonds, clubs, or spades) and by the rank (A, 2, 3, 4, 5, 
6,7,8,9, 10, J, Q, K). Our way of describing a card (e.g.,a 5 of hearts) makes 
this explicit. The set of cards can be pictured as in Fig. 2.1. Thus the set 
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2.1 Cards 
Al2/3/4/5/6|/7;8[,9/10| J /Q|IK 


ners TTT TTT 
Piamonts | TT EE 
om TTT ETT ETT 


Spades 


of cards is determined by two much simpler sets — the set of suits and the set 
of ranks. 

We have seen a very similar situation in Example 1.15 (Fig. 1.10). Here 
the possible outcomes of a tossed pair of dice was also determined by 2 
characteristics —the number on the first die and the number on the second. 
Thus the possible outcomes were determined by 2 much simpler sets—the 
setA = {1, 2,3, 4,5, 6} and B = {1, 2, 3, 4, 5, 6}. (In this case the sets were 
identical.) This situation is greatly generalized in the following definition. 


1 Definition 
If A and B are 2 sets, the product set C = A X B is defined as the set of all 
ordered couples (a, b), where ais in A and bis in B. 


Remark. An ordered couple (a,b) consists of 2 components. Its first 
component is a, its second is b. By definition, 2 ordered couples are equal if 
they agree in both of their components: 


(a,b) = (c,d) if and only ifa=candb=d (2.1) 


For example, (1,3) # (3,1) despite the fact that both of these ordered 
couples have | and 3 as components. It is for this reason that they are called 
ordered couples. The order in which the components are listed is a significant 
factor in determining the ordered couple. 

When we draw a diagram of the product set A X B, we form a table by 
putting the elements of A down in a column and B in arow. Then A X B will 
be the body of the table, with (a, b) appearing in the row opposite a and 
column under b. (See Figs. 1.10 and 2.1 for examples.) 

Using Definition 1, we can give several additional examples of product 
sets. If a coin is tossed, the outcomes may be taken as the 2 elements of the 
set d= {H,T}. If a coin is tossed twice, the outcomes may be regarded 
as the elements of A X A (see Fig. 2.2). If a coin is tossed and a card is chosen 
from a standard deck, the outcomes may be regarded as the product of 
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2.2 Sample Space for 2 Coins 


{H, T} and of C, the set of cards. If we wish to classify automobiles accord- 
ing to color and make, we might choose A = {black, blue, red, green, beige, 
white, other} and B={Chevrolet, Ford, Plymouth, Pontiac, Lincoln, 
Volkswagen, other}. Then A X B will be an appropriate set to classify cars. 
(We would expand A and B if we wanted more detail.) In the polltaker 
example (Example 18 of Chapter 1), we might take A = the set of all 10 
families. But (see Fig. 1.15) the set of possible outcomes was not A XA, 
because the diagonal squares [those of the form (a, a)] did not determine an 
outcome. 

The following theorem on counting the elements in a product set has 
probably been known to the reader since his early childhood. 


2 Theorem 
For any sets A and B, 


n(A X B) = n(A)-n(B) (2.2) 


We multiply the number of elements in A and the number in B to obtain 
the number in A xX B. Any proof would be more confusing than looking at 
Fig. 2.1, where 4 x 13 = 52. This theorem is so basic that it is often taken as 
the definition of multiplication in a theoretical treatment of multiplication. 

The situation is entirely similar if the objects in a certain set can be deter- 
mined by more than two characteristics. In this case we form the product of 
two or more sets. 


3 Definition 


If A,,...,A, are sets, the product C= A, XA, X::+XA, is defined as the 
set of ordered s-tuples (a,, a,,...,a;), where a; is in A; fori=1,2,...,5. 


Remark. As before, a, is called the first component,...,a, the sth com- 
ponent of the ordered s-tuple. Two s-tuples are equal if they are equal 
component by component. A 3-tuple is called a triple and a 2-tuple a couple. 

Some examples of s-tuples have been encountered before. In Example 
1.16 (5 tossed coins), each outcome was a 5-tuple, where each component 
was from the set A={H,T}. The set of outcomes was therefore 
C=AXAXAXAXA. In analogy with Theorem 2 (see Theorem 4), 
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n(C) =2X2X2X2X2= 2° = 32, as explicitly noted in Fig. 1.13. If 4 dice 
are tossed, the possible outcomes may be regarded as elements of B X B X 
BXB, where B = {1, 2, 3, 4, 5, 6}. Each component is merely the number 
appearing on the corresponding die. Here ordered couples are wanted, 
because, for example, (1, 2, 4, 1) is to be distinguished from (4, 1, 2, 1). 

We may regard a product of 3 sets as the product of the first 2 sets times 
the third: 

AXBXC=(AXB)XC 


For example, to find all the occurrences when 3 coins are tossed, we choose 
A = {H, T}. We then consider the outcomes as completely determined by 
the outcome on the first 2 coins and on the third coin; i.e., we consider 
(A XA) XA. We may systematically proceed as in Fig. 2.3. 


2.3 Three-Coin Sample Space 


HTH | HTT 


TTH ! TTT 


HH 


HT 


TH 


TT 


In analogy with Theorem 2, we have the following useful theorem. 


4 Theorem 
For any sets A,,...,A,, we have 


n(A, XA, X:°:XA,) =n(A,) -n(A,):...°n(As) (2.3) 


The proof is by successive application of Theorem 2, because a product of 
s sets may be defined in terms of a product taken 2 at a time. 

Theorem 4 greatly expands our ability to count; hence the range of prob- 
ability problems (using uniform spaces) is enlarged. The examples below 
illustrate some of the possibilities. 

In all the examples to follow, the motivation for using product sets is as 
follows. An outcome is determined by what happens at various “stages.” At 
the first stage, any possibility from set A, may occur. Regardless of what 
happens at the first stage, at the second stage any possibility from set A, may 
occur; similarly for later stages. The outcomes constitute the product set 
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A, X:°+XA,, and the first component of an outcome is what occurred at the 
first stage, etc. 


5 Example 
Three dice are tossed. What is the probability that no 6 turns up? That 
neither a 5 nor a 6 turns up? 

Let A, = {1, 2, 3, 4, 5, 6}. The sample space S consists of three numbers 
(x, y,z), each chosen from Ag. Thus § = Ag X Ag X Ag. Hence n(S) = 68 = 
216. Now let A; be the set {1, 2, 3, 4, 5} and let E; be the event “‘no 6 turns 
up”; then E; consists of all the triples (x, y, z), where each of x, y, z isin As. 
Thus E, = A, XA; XA;. Using Theorem 4, n(E;) = 5° = 125. Thus the 
probability of E; may be found using the basic Theorem 14 of Chapter 1: 

n(E;) _ 5®_ 125 


P(E) = n(S) 6 16> 579) 


Similarly, if A, is the set {1, 2, 3, 4}, the event E,, “neither a 5 nor a 6 turns 
up,” is the set A, X A, X A, and has probability 


_ 4 _ 64 


6 Example 
Three dice are tossed. What is the probability that the high number is 6? 
Similarly, find the probability that the high number is 5, 4, 3, 2, or 1. 

The method of Example 5 shows that there are 6? = 216 possible out- 
comes, of which 5?= 125 outcomes had no 6. There are 216—125 = 91 
outcomes with a 6, and thus having a high of 6. Similarly, there are 5S? = 125 


2.4 High Die When 3 Dice Are Tossed 
Outcome | Number of outcomes | Probability | Probability (percent) 


6 high 6 — 53 =9] ce 42.1 
5 high 53— 43 = 6] St, 28.2 
4 high 43 — 38 = 37 3, 17.1 
3 high 33-28 = 19 29. 8.8 
2 high 23—13=7 ots 3.2 


| high B=] ats 0.5 
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outcomes with a high of 5 or less (no 6), but 4% = 64 of these outcomes have 
a high of 4 or less (no 5 or 6). Thus 125 — 64 = 61 outcomes have a high of 5. 
Continuing in this way we obtain Table 2.4. 

These results were given without proof in Table 1.3. 


7 Example 
In Fig. 2.5 how many paths can be drawn tracing the alphabet from A 
through H? 


A 
or 


JN, 
f\IN/\ 
/\/\IN/*, 


/ \/\ J INL, 
L\SSININSNLS 
G G G G G 
/ ANNAN AN 
H H H H H H H H 
2.5 Alphabet, with Path LRRRLRL 


At each state we have a choice of going left (L) or right (R). Furthermore, 
any sequence of seven directions (L or R) gives a unique path with two 
different sequences giving different paths. Hence if we let A = {L, R}, the 
paths may be regarded as the product set P=AXAXAXAXAXAXA, 
and n(P) = 27 = 128. 

If 7 coins are tossed, the same analysis is involved. To see the corre- 
spondence, we may easily imagine starting at A and thinking “heads, we go 
left; tails, we go right.”’ 


8 Example 


A roulette wheel has 38 (equally likely) slots, of which 18 are red, 18 black, 
and 2 green. What is the probability that 5 reds in a row turn up? 

The sample space S$ consists of a sequence (x,y, z,u,v), where each 
letter represents one of the 38 slots. Thus n(§) = 385. The event E, “all slots 
red,” consists of 5-tuples in which each entry is | of 18 possibilities. Thus 
n(E) = 18°. Finally, it is reasonable to assume that S is a uniform prob- 
ability space. Thus 


pD(E) = a = oa - (3a) = (5) = 0238) 
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9 Example 


A pack of cards is divided into 2 piles. One consists only of the 13 spades. 
The other contains all the other cards. A card is chosen from each pile. What 
is the probability that a spade picture (J, Q, or K), and a heart is chosen? 

If A is the set of spades and B the other cards, we have n(A) = 13 and 
n(B) = 39. The sample space is § = A X B, because any element from A may 
be paired with any element of B. There are n(4 X B) = 13 X 39 elements in 
S. If we let P be the set of picture spade cards and H the set of hearts, we 
wish to find the probability of P x H. But n(P X H) = n(P) X n(7) = 3 X 13. 
We naturally assume a uniform space, so we obtain 


3x13 1 


13x39 = 13 


p(PXH)= 


(Note that it would be foolish to multiply 13 and 39 because an eventual 
cancellation occurred.) 


10 Example 
A set has 10 elements in it. How many subsets or events are there? 

Call the elements 1,2,...,10. Then a subset is determined by deciding 
whether 1 is in it or out of it, and similarly for 2,3,...,10. If we let A = 
{in, out} or {I, O}, we see that 

P=AxX:::XA 


a 
10 factors 


may be regarded as the subsets of the given set. For example, 1 OO OTTI 
O I O corresponds to the subset {1, 5, 6, 7, 9}. Thus there are n(P) = 2!° = 
1,024 subsets. This includes the whole set, described by I I--- I, and the 
empty set, described by OO: -- O. 

This example clearly generalizes to a set with n elements. Thus, if S has n 
element, there are 2" subsets of S. 


EXERCISES 


1. Let A={H,T}. In analogy with Fig. 2.3, list all the elements of 
AXAXAXA ina4x4 table, by treating this set as the product of A x A 
with itself. Using this table, find the probability that, when 4 coins are tossed, 
2 heads and 2 tails occur. 


2. A menu has a choice of 5 appetizers, 8 main dishes, 4 desserts, and 3 
beverages. How many different meals can be ordered if it is required to 
order one item in each category? How many meals can be ordered if it is 
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required to order a main dish and a beverage but an appetizer and the dessert 
are optional? 


3. Urn A contains 6 green and 5 white marbles. Urn B contains 7 green 
and 24 white marbles. One marble is chosen at random from each of the urns. 
What is the probability that both are green? That both have the same 
color? 


4. A die is tossed 4 times in a row. What is the probability that a 6 occurs 
on the fourth toss but not on the first 3 tosses? 


5. Let us define a finite sequence of letters to be a word. How many 
3-letter words are there? How many 3-letter words are there which begin 
and end with a consonant and have a vowel in the middle? (Treat Y as a 
consonant.) 


6. A man decides to exercise by walking 5 blocks daily from a street 
intersection near his home. Since he is in the middle of a city, once he is at 
the end of a block, he always has 4 directions to continue his walk. How 
many paths are possible? Express the set of paths as a product of appropriate 
sets. 


7. Suppose the man in Exercise 6 decides that once he walks a block he 
will not immediately go back along that same block. He is willing to take his 
daily 5-block walk as long as he can proceed along a different route every 
day. Will he stroll for a full year? Express the set of paths as a product of 
appropriate sets. In particular express the 2 paths in Fig. 2.6 as ordered 
s-tuples, using your answer. 


2.6 Two Paths 


8. Ten differently colored marbles are to be tossed into 3 urns, A, B, 
and C. In how many ways can this be done? (It is not necessary to evaluate 
the answer.) If the marbles are tossed in at random, what is the probability 
that A will be missed? 
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9. A multiple-choice test has 3 questions on it, each with 3 possible 
answers. A class of 30 takes the test. Each student answers all the questions. 
Show that the teacher will necessarily be able to find 2 identical papers 
regardless of how uninformed the class is. 


10. A house has 5 rooms, and the owner is going to paint the house. He 
has a choice of 6 colors that he may use in any room. How many color 
schemes does he have to consider? 


11. How many 4-digit numbers are there which do not contain any of the 
digits 0, 1, or 2? Of these numbers, how many end with the digit 7? 


12. A man has a penny, a nickel, a dime, a quarter, and a half-dollar. Using 
only these coins, how many different amounts can he form? 


13. A man has a I-, a 2-, a 3-, a 5-, and a 10-cent stamp. Explain why 
product sets cannot be used to compute the different amounts of postage 
possible with these stamps to obtain the same answer as in Exercise 12. 


2 MULTIPLICATION PRINCIPLE 


In Example 18 of Chapter 1 (the poll of 10 families), product spaces were 
not immediately usable, because the possibility of polling the same family 
twice was eliminated. Yet the method of choosing a family was a 2-stage 
affair. First, one family was chosen (10 possibilities). Then a different family 
was chosen (9 possibilities). The product 10 x 9 = 90 gave the number of 
possible choices. Here we may say that a choice is an ordered couple 
(x,y) except that the possibilities for y are determined by the value of x. 
We now generalize this procedure. 


11 Theorem. The Multiplication Principle 

Let A and B be sets, and let C be a subset of A x B. Suppose that A has n 
elements in it, and that for every element a of A there are exactly m ele- 
ments of the form (a, x) in C. Then 


n(C)=n-m (2.4) 


This theorem is illustrated in Fig. 2.7. Here the set C is indicated by the 
dots. A has n= 5 elements in it, and although B has 10 elements in it, there 
are only 7 elements of C in each row. Thus m= 7, and C has 5 X7 = 35 
elements. 
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Theorem 11 may be restated in the following useful alternative way. 


11’ Theorem 

Suppose that an occurrence is determined in 2 stages. If there are n ways 

in which the first stage can occur, and if for each choice of the first stage, 

the second stage can occur in m ways, there are n-: m possible occurrences. 
The proof of Theorem 11 is straightforward. If we call a,, a.,...,a, the 

elements of A, we have precisely m elements of C that start with a,; there 

are m elements of C that start with a,; etc. In all we have 

m+m+rss:m=n-m 


eS 
n times 


elements of C, which is the result. 

As with product spaces, it is essential in this counting process that differ- 
ent ordered couples correspond to different events. For example, if we wish 
to interview 2 families from among 20, there are 20 x 19 = 380 ways this 
can be done. (At the first stage, we have 20 possibilities. For each of these 
possibilities, there are 19 ways of selecting the second family.) However, 
this counting process distinguishes 4B from BA. It makes sense to consider 
which family was interviewed first. 

The generalization of Thereom 11 to more than 2 sets is fairly straight- 


forward. For later applications, however, we prefer to generalize Theorem 
11’. 


12 Theorem 


Suppose that an occurrence is determined in s stages. If there are n, ways 
in which the first stage can occur, if for each choice in the first stage the 
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second stage can occur in m, ways, and if for each choice in the first two 
stages the third stage can occur in n, ways, etc., then there aren,:n.-...° Ng 
possible occurrences. 

The proof merely uses Theorem 11 over and over, and we omit the details. 


13. Example 


How many 3-digit numbers are there in which no digit is repeated? 

We think of choosing the number in 3 stages —the first, second, and then 
the third digit. There are 9 first choices, because we do not allow a number 
to begin with 0. After the first digit is chosen, there are 9 possibilities for 
the second, because any of the remaining digits are available. At the third 
stage, after the first 2 digits are chosen, there are 8 possibilities. Thus there 
are 9X 9 X 8 = 648 such numbers. 

Problems such as these are sometimes done diagramatically. In Fig. 2.8a, 
the three digits are blanks to be filled in. The order of filling them is also 
indicated. We then put the number of possibilities in the blanks and multiply 
as in Fig. 2.8b. 


2.8a 2.8b 


l9|xl9]x{/3| = 648 
1 


2 3 Answer 


If we were to start this problem with the third digit (the unit’s place), we 
would obtain 10 possibilities for the first stage and 9 for the second. How- 
ever, the number of possibilities for the third stage (the hundred’s place) is 
sometimes 8 (if 0 has already been chosen) and sometimes 7 (if 0 has not 
been chosen). Thus Theorem 12 does not apply with this procedure, because 
the theorem calls for a fixed number of possibilities at the third stage, 
regardless of which elements were chosen at the second stage. Therefore, 
in any problem involving choices at different stages, we usually take, as the 
first stage, the event with the most conditions placed on it. 


14 Example 


How many even 3-digit numbers are there in which no digit is repeated? 

In this problem we have conditions on the first digit (not 0) and the last 
(must be 0, 2, 4, 6, or 8). Whether we start with the first or third digit, we 
come to an ambiguous situation at the last stage. When this happens (we 
may call this an “it depends” situation), we proceed as follows. We start at 
the third digit (the unit’s place). There are two possibilities, which we con- 
sider separately: 1. The third digit is 0. 2. The third digit is 2, 4, 6, or 8. After 
disposing of the third digit, we go to the first digit and then to the second. The 
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entire procedure 1s summarized as follows: 


1. Third digit 0. lo|x|s|x|i] = 72 
Stage: 2 3 l 

2. Third digit2,4,6,or8. |8|x|{8|x|4] =256 
Stage: 2 3 l 
Total possibilities = 328 


In brief, when an “‘it depends”’ situation arises, we ask “On what?’’. We 
then break up the possibilities accordingly. 

An alternative solution proceeds as follows. The odd integers with no 
repetition can be computed according to the scheme 


Is|x{s|x [5] =320 
3 I 


Stage: 2 


By Example 13 there were 648 integers under consideration, so the remain- 
ing 648 — 320 = 328 are even. 


15 Example 
Three cards are taken from the top of a well-shuffled deck of cards. What 
is the probability that they are all black? Similarly, find the probability that 
there are 2, |, or no blacks in the selection. 

A natural sample space to consider is the set S§ of triples (x, y, z), where 
x, y, z are different cards. Using the multiplication principle, there are 
52-51-50 possibilities. To find the probability that all cards are black, we 
must find the number of triples in S in which all components are black. Using 
the multiplication principle again, there are 26-25-24 such possibilities. 
Hence 
26-25-24 _ 


p(all cards are black) = 50751750 7 


2 
7 (= .118) 

To find the probability that two of the cards are black, we consider 3 
cases: The red card is the first, second, or third card chosen. (This is an 
“it depends” situation.) If the red card is the first card, the multiplication 
principle shows that there are 26 - 26 - 25 possibilities. The same answer is 
found in the other 2 cases. Thus there is a total of 3 - 26 - 26 - 25 possibilities 
with | red and 2 black cards, and 


_ 3:26:26-25 13. 
p(exactly two blacks) = “32-51-50 34 (= .382) 

The number of possibilities for 2 black cards can also be computed by 
the following 4-stage scheme (posed as questions): 1. Where is the red 
card? 2. What red card appears? 3. What is the first black card that appears? 
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4, What is the next black card? In this manner we may obtain 3 - 26 - 26: 25 
possibilities directly. 

The probabilities for only 1 black card, and for no black cards, need not 
be computed anew. These cases are the same as the cases “2 red” and “‘all 
red,” and these probabilities have already been found, if we reverse the 
roles of red and black. 

We summarize the results in Table 2.9. The reader should compare with 
Exercise 4, Section | of Chapter 1. 


2.9 Count of Black Cards Among 3 Cards Chosen at Random 


Number of black cards 


Probability 


Probability (percent) 


16 Example 
Five men and 5 women are invited to a dinner party. They quickly notice 
that the sexes alternate about the tavle. (Each person was assigned to a 
specific seat. The table had 10 places.) When it was suggested that this was 
no accident, the hostess claimed that it was in fact due to chance. Was she 
telling the truth? 

Here, there are 10 chairs, which we think of as the stages. We want to 
find the probability of alternating sexes. Using the multiplication principle, 
the sample space—all the possible arrangements—has 10:9-8-7-6°-5: 
4-3-2-1 elements. The number of arrangements in which the sexes alter- 
nate about the table is found by the multiplication principle. Going around 
the table from a fixed seat, we find that there are 10:5-4-4-°3-3-2-2-1-1 
possibilities. The probability is 

— 10°5°4-4-3-3-2:2-1-1_ 1 


P=10-9-8-7-6°5°4°3-2°1 126 00” 


In all probability, this sort of an arrangement was no accident. 


EXERCISES 


1. A drawer contains 4 black and 4 red socks. Neil grabs 2 socks at 
random. What is the probability that they match? 


2. How many 3-digit numbers are there? Of these, how many are there 
which 
a. begin with the digit 2, 3, or 4? 
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. begin with the digit 2, 3, or 4 and have no repeated digits? 
. have | digit repeated? 

. do not contain the digit 5? 

. contain the digit 5 and have no repeated digits? 

. contain the digits 5 and 9? 

. do not contain either of the digits 5 or 9? 

. contain the digit 5 or 9 but not both? 


s—[sToO moan & 


3. A box of diaper pins contains 20 good pins and 3 defective ones. 
Three diaper pins are chosen at random from the box. What is the probability 
that they are all good? 


4. Three cards are chosen at random from a deck of cards. What is the 
probability that none are aces? What is the probability that none are picture 
cards (jack, queen, or king)? 


5. Four people meet at a party and are rather surprised to discover that 2 
of them were born in the same month. Find the probability that 4 people 
chosen at random were born in different months. (You may assume, as a 
good approximation, that the months are equally likely.) 


6. In how many ways can 3 people sit at a lunch counter with 10 seats? 
How many ways can they sit so that no 2 of the people sit next to one 
another? 


7. Five dice are tossed. What is the probability that all 5 dice will turn up 
different numbers? 


8. Three cards are chosen at random from a standard deck. Find the 
probability of choosing 
a. different suits. 
b. all the same suit. 
c. different ranks. 


9. (Three-card poker) Three cards are chosen at random from a standard 
deck. Find the probability of choosing 
a. aroyal flush (Q, K, A all of the same suit). 
b. a straight flush (3 consecutive ranks, all of the same suit; we treat an 
an ace as a |, and the jack, queen, and king as 11, 12, and 13, respec- 
tively). 
c. a flush (all cards of the same suit but not a royal flush or a straight 
flush). 
d. a straight (3 consecutive cards not all of the same suit; here the con- 
vention is that an ace may be regarded as either a 1 or a 14, and the 
convention for the picture cards is as in part b). 
e. 3 of a kind (all cards of the same rank). 
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f. a pair (2 cards of the same rank and 1 of a different rank). 
g. a bust (none of the above). 


10. An urn contains 7 red balls and 3 green ones. If 3 different balls are 
chosen from the urn, what is the probability that they are all red? 


11. In Example 15 the probability p, of choosing 3 black cards was 
shown to be 7. For reasons of symmetry, we noted that p, =p, and 
P3 = Po. (Here p; is the probability of finding i black cards.) Using these facts 
alone, find p,. (Hint: Use Equation 1.5, after you have identified the appro- 
priate sample space.) 


12. Three people each choose a number from | to 10 at random. What is 
the probability that at least 2 of the people chose the same number? 


13. A president, vice president, and treasurer are to be chosen from a 
club that has 12 people (A, B, C, D, E, F, G, H, I, J, K, L). How many 
ways are there to choose the officers if 

a. A and B will not serve together. 
. C and D will either serve together or not at all. 
. E must be an officer. 
F is either president or will not serve. 
I andJ must be officers. 
K must be an officer and he must serve with L and/or G. 


moan & 


3 PERMUTATIONS AND COMBINATIONS 


There is a joke that goes as follows. Gus: “There are 100 cows in that field.” 
Bob: “How do you know?” Gus: “I counted 400 feet and divided by 4.” 
Anyone who has ever tried to explain a joke knows that it is not worth doing. 
Nevertheless, the mathematics can be explained simply enough. Gus 
counted 400 legs. Hence each cow was counted 4 times. Thus there were 
400/4 = 100 cows. This illustrates the division principle. The strange feature 
is that it is actually useful. It is sometimes easier to count legs than cows, 
even if there are more legs. 


17 Theorem. The Division Principle 
Let S be a set with n elements, and suppose that each element of S deter- 
mines an element of a set 7. Suppose further that each element of T is 
determined by exactly r elements of S. Then there are k = n/r elements of T. 
In this theorem we think of using S to count the elements of 7. The 
hypothesis tells us that each element of T is counted r times. 
In the joke above, S was the set of legs (n = 400). Each leg determined a 
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cow (an element of 7, the set of cows). Each cow was determined by exactly 
4 legs (r = 4). Therefore, there were k = “ = 100 cows. 

To prove Theorem 17, suppose the elements of T are t,, t.,...,t,. Let A; 
be the set of elements in S§ which determine t;. By hypothesis, A; has r ele- 
ments. But every element of S is in exactly one of the A,’s. Therefore, the 
number of elements in S is the sum of the number elements in the A,’s. Thus 
n=rt+-:-+++r= kr. Hence k = n/r, which is the result. 


18 Example 


In a set containing 10 elements, how many subsets are there which contain 3 
elements? 

Suppose the set is A = {1,2,..., 10}. We may form a subset of 3 elements 
by first choosing any number in A (10 possibilities), then choosing another 
in A (9 possibilities), and finally choosing the last number (8 possibilities). 
There are 10: 9-8 ways of doing this. This ordered triple determines a sub- 
set of 3 elements. But a subset, such as {2, 5, 7}, is determined by several 
different ordered triples: (2,5, 7), (5, 7,2), etc. In fact, we can see that any 
subset is determined by 3 - 2: 1 ordered triples. This is precisely the situation 
of Theorem 17. The set S of ordered triples has n = 10- 9-8 elements, and 
the set TJ consists of subsets with 3 elements. Each element of S determines a 
subset (an element of J), and each subset is determined by r=3:2- 1 
element of S. Thus there are k= 10-9-8/3-2-1= 120 subsets with 3 
elements. 

This is a typical use of the division principle. We first count, using the 
multiplication principle. Then we consider how many repetitions of each 
case occur and divide by that number. 

To generalize this example, it is convenient to use the language of sam- 
pling. Sampling occurs when we have a population and we wish to choose 
some members of this population in order to observe some of their charac- 
teristics. Some examples are as follows. 

1. A polltaker wants to see how the people in a large city feel about an 
issue. Usually, only relatively few people (the sample) are interviewed. 

2. A quality-control expert might want to find out how close to specifica- 
tions a factory-produced transistor is. He has a pile of 5,000 transistors. He 
chooses some of these (his sample) and tests them. 

3. When we throw 5 dice, we have a sample of size 5 from the population 
{1,2,3,4, 5, 6}. 

4. A poker hand (5 cards) is a sample of size 5 from a population of 52. 

5. A class of 25 in a school whose enrollment is 600 may be considered a 
sample. 

The general situation may be described as follows. We start with a popula- 
tion, which is a set P. We then choose a sample of size r. There are two 
broad categories: (1) ordered or. unordered, and (2) with or without replace- 
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ment. An ordered sample of size ris merely an ordered r-tuple (a,,..., a,) of 
elements of P. We have an unordered sample if we agree to regard 2 r-tuples 
as identical if one can be reordered into the other. In the 5-coin example, 
HHTHT and THHTH are different ordered samples of size 5 (from the 
population {H, T}) but are regarded as identical unordered samples. If we 
pick up 5 cards to form a poker hand, it is customary to regard this hand as 
an unordered sample. We usually do not care which card was picked up first, 
second, etc. 

A sample (a,,...,a,) without replacement occurs when the elements a; 
are distinct. We imagine an urn filled with marbles and we pick r different 
marbles. Similarly, when we choose 5 cards, they are different. In a sampling 
with replacement, repetitions are allowed. In an urn situation, after we pick 
a marble, we put it back before we pick another. A policeman who tickets 
motorists (the population) does it in order (his tickets are numbered), and 
he is not prohibited from ticketing the same fellow twice (a sampling with 
replacement). 

In many cases the decision as to whether a sampling is to be regarded as 
ordered or not, or even with or without replacement, is somewhat arbitrary 
and depends upon the intended use. For example, in the 5-coin example 
(Example 1.16) we decided on an ordered sample because it seemed 
reasonable that these were equally likely. 

Using the language of sampling, Example 18 called for the number of 
unordered samples of size 3 without replacement from a population of 
size 10. To generalize this example, we introduce the following definition. 


19 Definition 
n!'=1-2-3-...:Hn (2.5) 


The notation n! is read “n factorial.” The definition does not imply that 
n must be larger than 3. Rather, starting at 1, we keep multiplying successive 
numbers until we reach n. Equivalently, n!=n(n—1)---1. Thus 1!= 
1,2!=2-1=2, 3!=3:2-1=1-2-3= 6, etc. Table 2.10 is a brief table 
of factorials. It is seen that n! grows very fast with n. For example, 52! = 
8.066 xX 10°” to 4 significant figures. This is the number of possible ways of 
arranging a pack of cards. 


2.10 Values ofnit, I <n <= 10 


I!= 1 6!= 720 
2!= 2 N= 5,040 
3!= 6 8!= 40,320 


4'= 24 9!= 362,880 
S5!= 120 10! = 3,628,800 
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20 Theorem 

If S has n elements, there are n! ways of arranging S in order. Equivalently, 
there are n! ordered samples of size n without replacement from a population 
of size n. 

This theorem is an immediate consequence of the multiplication principle 
and needs no further proof. 

An ordered sample of size r without replacement is also called a permuta- 
tion of size r. Thus we may say that there are n! permutations of size n 
using a population of size n. If smaller samples are considered, we use the 
following definition and theorem. 


21 Definition 
Ifr <n, 
nP,=n(n—1)-+++ (n—rt+1) (2.6) 
sears noel 
r factors 

Thus ,9P?; = 10-9-8. In the symbol ,,P,, n is used to start the multiplica- 
tion, and we successively reduce each factor by 1. The number of factors is 
r. Thus ;P3 = 10-9-8 and 5oP,, = 50: 49---34. Here the factor 34 was 
computed as in Equation 2.6: 50— 17+ 1 = 34. The letter P is used to recall 
the word “‘permutation.”’ 


22 Theorem 

If S has n elements, the number of ordered samples of size r without re- 
placement is , P,. Equivalently, the number of permutations of size r from a 
population of size nis ,P,. 

Theorem 22 is also an immediate consequence of the multiplication 
principle. In fact, Theorem 20 is a special case of Theorem 22, using 
This equation is an immediate consequence of Equation 2.6 and Definition 
19. 

An alternative form for expressing ,P, is obtained by multiplying the 
numerator and denominator of the right-hand side of Equation 2.6 by 
(n—r)!. For example, 

_ _ 10:9-8-7:6-5-4-3-2-1_ 10! 
wl's = 10-98 = 7:6°5-4-3-2°1 7! 
In general, 


P= (2.8) 


A brief examination of Table 2.10 shows that Equation 2.8 is useful for 
theoretical purposes only. We do not calculate ,)P, by dividing 10! by 8!. . 
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An unordered sample of size r without replacement is called a combina- 
tion of size r. Equivalently, a combination of size r is the same as a set with 
r elements. We may generalize Example 18 with the following definition and 
theorem. 


23 Definition 
Ifl<r<n, 


_ (n\_ »P,_ n(n—1)---(n—rt+1) 
C= ( )- rl r(r—1)---1 (2.9) 


Here the symibo. ,C, is used in analogy with ,P,. The letter C indicates 


oe ; n\. , 
that combinations are being found. However, the symbol (") is now in rather 
coe ag n\ . 
general use and we shall generally use it in this text. Note that (") is not the 


. on , , 
fraction There is no horizontal bar, and parentheses must be used. For 


example, 
10\_ 10-9-8 n\_n(n—1) (i)=7- 
(3)> 32-1 (3)= 2-1 i) 1” 


24 Theorem 
If S has n elements, the number of unordered samples of size r without 


replacement 1s (” ) Equivalently, there are (” subsets of S with r elements. 


The proof of this theorem is exactly as in Example 18, and we spare the 
reader the details. We note here that it is not even immediately evident from 


Equation 2.9 that (") is an integer. Thus it may appear to be a coincidence 


that ean has enough cancellations to be a whole number. But this 
is clear from Theorem 24. 


If we use Equations 2.8 and 2.9, we derive an alternative formula for 


("). Thus 
r 


and we have 


(") = (2.10) 


ri(n—r)! 
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10 
3 


10!/7!3!. Thus (7) = (;) In general we have 


7 3 
(=) ean 


To prove this we use Equation 2.10. We have 


For example, ( )- 10!/3!7!. By the same formula, however, (7) = 


n\_ n!} _ n!} _(n 
(,,) ~ (n—r)"(n—(n—r))!) (n—r)trt (") 
which completes the proof. 

Equation 2.11 may also be proved in the following way. To choose a sub- 
set of r from a set of n things, it suffices to choose the (n —r) objects that are 
not to be in the set. Thus, to specify which 15 of 18 books are to be taken 
from a library, it is only necessary to specify which 3 are to be left behind. 
The number of possibilities is the same in either case. 

n 
0 
the one used in algebra to define a® = 1. The decision is governed by an 


It is convenient to define 0!, ,P», and ( ) This is a convention, similar to 


211 ,P,d<rsns8 


s[r/s]20) «| 120) 20 

Jif nm | 

7 ji }7 42 210 840 | 2,520 | 5,040 | 5,040 

8 $118 |56) 336 1 1,680 16,720 | 20,160 | 40,320 | 40,320 
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attempt to make our formulas true for r = 0. We thus define 
n 
O!=1 neo = 1 (5) =aCo= | (2.12) 


These definitions make Equations 2.7 through 2.11 true. 
Tables 2.11 and 2.12 give brief tables of ,P, and (") for small values of n 


and r. 
We conclude this section with an example that illustrates one use of 
combinations. 


25 Example 


Ten coins are tossed. What is the probability that 5 are heads and 5 are tails? 
Similarly, find the probability of rheads (0 < r < 10). 
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Let A = {H,T}. We take the sample space to be A XA X:-::XA (10 
factors). Equivalently, a sample point is an ordered sample of size 10 with 
replacement from the set A. There are 2'° = 1,024 sample points, which we 
take to be equally likely. We wish to count how many have 5 heads. A 
sample point with 5 heads is specified by determining where those heads 
occur. Thus any subset of size 5 of {1,2,..., 10} determines a sample with 
exactly 5 heads if we think of the numbers as determining a location. For 
example, the subset {2, 4,5, 7, 10} determines the sample point THTHHT- 


, 10 ; 
HTTH, and conversely. Since there are ( 5 subsets of size 5 from a popula- 


tion of size 10, we have exactly (5 = 252 of the required sample points. 
The probability is 


252 


p(5H) = T0004 = 


24.6%) 


In the same manner, we may find that the probability of r heads is ,)>C,/2!°, 
r=0,1,...,10. The results are summarized in Table 2.13. The probabilities 


2.13 Number of Heads Among 10 Coins 


Number of 

heads 0 1 2 3 4 5 6 7 8 9 10 
rena a [a [aw a | ww [| 
Probability 


(percent) 10.1 | 1.0 | 4.4 | 11.7] 20.5 | 24.6] 20.5 | 11.71 4.41 1.01 0.1 


are indicated graphically in Fig. 2.14. The reader should compare these 
probabilities with his experimental results for Exercise 7, Section 1 of 
Chapter 1. 


EXERCISES 
1. Evaluate, without using tables: 
a. 6! b. 8!/5! 
c. 3P, d. so? 
e. 2C3 f. g0P7/siP, 
20 
2. 50C 46 h. (7 )/(9) 


i. (3!)! je 81/(4!) 
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2.14 Number of Heads Among 10 Tossed Coins 


2. Express each of the following using factorial notation. 


a. wPs b. 51-50! 
c. 20-19- 18! d. 20C 16 
e. n(n— 1) f. (n+ 2)(n+ 1)(n) 
g.2°4-6-°8-10:--48 h. 1°3:5°7---21 
3. Add, expressing your answer in factorial notation, as far as is feasible. 
ty 
* 10! I! 
| | 
> T1917 610! 
P 18! 18! 
" 810! TIL! 


4. Using Equation 2.10, prove that 


n n\_ {n+ ) 
(") + (,. , 7 ("; 1 
5. From a club with 12 members, how many committees of 4 may be 
found? 


6. There are 20 points on the circumference of a circle. It is desired to 
form triangles choosing vertices from these points. How many triangles 
can be formed? 


7. In Exercise 6 it is decided that a given point must be used as a vertex 
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for the triangle. How many triangles are possible with this additional 
restriction? 


8. The figure represents the layout of streets in a certain city. A man 
starts at A and walks along the streets to B always moving north or east. 


a. How many paths is it possible for him to take? 

b. If he insists on passing by the intersection C to view a certain bill- 
board, how many paths from 4 to B are available to him? 

c. If he must pass the candy store located at D, how many paths (from 
A to B) are available to him? 


9. Each of the following may be regarded as a sample taken from a 
population. Give the population and the size of the sample, if possible. 
State whether the sample is ordered or unordered and whether it is with 
replacement or without replacement. In some cases there are several pos- 
sible answers. If so, give the possibilities and explain the ambiguity. 

a. A word in the English language. 

b. A 4-digit number. 

c. A poker hand (5 cards). 

d. The result of tossing 8 dice. 

e. The result of tossing 10 coins. 

f. A listing of the top 5 students, in order of ranking, from a graduating 
class. 
g. A baseball team. 
h. The final standings in the American League. 

i. The first-division teams in the National League. 
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n 
In Exercises 10 through 14, express your answer using the ,P,, ("), orn! 
notation. It is not necessary to evaluate your answer. 


10. If 3 cards are chosen from a deck of cards, what is the probability 
they are all black? 


11. How many possible poker hands (5 cards) are there if we do not 
count the order in which the cards are picked up? What is the probability 
that a poker hand consists only of spades? 


12. A certain experimenter wishes to discover the eating habits of ham- 
sters. He is sent 20 hamsters, of which 2 happen to have enormous appetites. 
He chooses 5 of the hamsters at random. What is the probability that he 
does not choose the big eaters? 


13. A man has 30 books, but only 20 can fit on his bookshelf. How many 
ways can he arrange his bookshelf? 


14. An urn contains 6 white balls and 9 black balls. Two balls are chosen 
at random. What is the probability that they are both black? (Do this prob- 
lem using ordered as well as unordered samples.) 


15. Any ordered n-tuple determines an ordered r-tuple; namely, its first 
r components. Using this fact, prove Equation 2.8 using the division prin- 
ciple and Theorem 20. Similarly, prove Equation 2.10 using the division 
principle, the multiplication principle, and Theorem 20. 


4 BINOMIAL COEFFICIENTS 


A brief inspection of Table 2.12 shows an interesting and useful property of 


n ; 
the numbers ("). If 2 consecutive entries on one row are added together, the 


sum is the entry on the next row, under the second summand. Referring to 
table 2.12, row n = 6, we see under columns r = 2 and 3 that 15+20 = 35, 


7 6 6 
the entry in row 2 = 7 and column 3. Thus (3) = (>) + (3) We shall shortly 


prove this in general, but let us now note that this fact gives an excellent 
technique for constructing this table. For example, starting with the row 
n= 3 (1,3,3, 1), we can construct the next row n = 4 by putting | at each 
end and adding as indicated in Fig. 2.15. This process can be continued in- 
; er n\ , 
definitely and in this way we can construct a table of (") without using the 
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r= 0 l 2 3 4 


(n= 3) | 3 


3 | 
(n = 4) > ISIN, 


n 
r 


2.15 Property of ( 


definition of (") given by Equation 2.9. In fact, the whole process can be 


started with the single entry 1 corresponding to n=0, r= 0. [The more 
conservative reader might prefer the row (1, 1), corresponding to n= 1.] 
Table 2.12 is called Pascal’s triangle, in honor of the great seventeenth- 
century mathematician. We shall refer to the method illustrated in Fig. 2.15 


, n 
as the Pascal method for generating the numbers (”). We now prove the 


general result. 


26 Theorem 
(T= (")+ n 
r+1}) \r r+1 (2.13) 


This equation can be proved directly using Equation 2.10 and some alge- 
braic manipulation. (This is Exercise 4 of Section 3.) We shall give an 
alternative proof using Theorem 24. Suppose S is a set with n+ 1 elements 

n+1 
r+] such sub- 
sets. Now fix element a, in S, and distinguish between the subsets that con- 
tain a, or do not. If a, is to be in a subset, there are only r choices from the 


and we wish to form a subset with r+ 1 elements. There are ( 


remaining n elements of S$, hence there are (") of the subsets with a, in them. 
If a, is not to be in a subset, we must choose r+ 1 elements from the re- 
maining n elements of S. There are (1 7 ways of doing this. Summarizing, 
we have 


l. (") subsets of size r+ 1 with a, in them. 
2. ( , subsets of size r+ 1 with a, not in them. 


Thus there are (") +( subsets of size r+ 1 in all. But we also know 


n 
r+1 
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n+] 
r+] 
Using Equation 2.13 we can prove the famous binomial theorem. 


that there are ( subsets of size r+ 1 in all. Hence we have the result. 


27 Theorem. The Binomial Theorem 


(1+x)"= 1+(i)x+(S)e+- +(," )e +x" 


=y (") (2.14) 
r 
r=0 
For example, corresponding to row 5 of Table 2.12 we have 


(14+x)§=14+5x4+ 10x?4+ 102°°+5x4+% 


To prove Formula 2.14 in general, we start with a known case, say n = 2. 
We have (1+x)?=1+2x+.x. Multiply by (1+) to obtain (1+x?= 
(1+2x+x?)(1 +x). This multiplication may be performed as follows: 


14+2x +x 
1 +x 


xt2xr?+% 
14+2x+ x 


1+3x4+3x?4+ x 


Thus (1 +x)? = 1+3x+3x?+2°, which is Equation 2.14 for n = 3. Multiply- 
ing again by (1+ x), we have (1+x)* = (14+3x+3x?++)(1+ x). Let us 
multiply as above, using the coefficients only. 


14+3x+3x7+x% — 13 3 1 
l+x 1 1 
1 3 3 1 
1 3 3 1 
1 46 4 1 


Thus (1 +x) = 14+4x4+ 6x? + 4x°+ x*. These examples show (and it is easy 
to see in general) that to multiply by 1 +x, we slide the coefficients one over 
to the left, and add. The results are the new coefficients, starting with the 
constant term and proceeding through increasing powers of x. Therefore, 
this step-by-step method of constructing the coefficients of (1 +x)” is identi- 


, n\ «. 
cal to Pascal’s method of constructing the numbers a) Since they both start 


with (1,2, 1) for n=2, or (1,1) for n=1, and since the method gives a 
unique answer for each n, the coefficients in (1+.x)” are identical to the 


numbers (") (r= 0,1,...,2). This completes the proof. 
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Because of Theorem 27, the numbers (") are also called the binomial 


coefficients. 
A more general form for Equation 2.14 involves (x+ y)". Thus 


(x+y)" a att (Te ty + . +(," jor tty 


= y (")eryr (2.15) 
r=0 


This follows directly from Equation 2.14 by substituting y/x for x in that 
formula and simplifying. Thus, from Equation 2.14, we obtain 


(iff tote 


Multiplying by x” we have 
"(1 +») = x"+ (They freefyr 


Finally, the left-hand side may be simplified to (x+ y)”: 


"(1 +)’ = (1 +2) = (x+y)" 


This proves Equation 2.15. This equation is also called the binomial theorem. 

We may give the following alternative proof of Equation 2.14. In the 
expansion (1+x)" = (1+ x)(1+x):::(1+x) we multiply out as follows. 
Choose either a | or an x in each factor (1+.x) and multiply. Then add the 
results. The coefficient of x” is precisely the number of ways we can choose 


r x’s from among these n factors. By Theorem 24 this number is ("). 


Q.E.D. 
If we substitute x = 1 into Equation 2.14, we obtain 


=-()e()-Qe64)6) 2 


This equation is not surprising. By Example 10 we know that there are 2” 


n n 
) have | element, (5 


etc. (Theorem 24). Thus Equation 2.16 may be regarded as a way of count- 
ing all the subsets of a set with n elements. 
If we substitute x = —1 into Equation 2.14, we obtain 


(CeO Gen=0) wen an 


subsets of a set with n elements, ( have 2 elements, 
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Transposing the negative summands, we obtain 


G)+G)e= (eGo a 


Here the sums continue on each side until the last term is either (") or 


n 
(" i): For example, 
3+ 104+1=14+10+5 (n= 5) 
6+20+6=14+15+15+1 (n = 6) 


Equation 2.18 may be interpreted: “Ifa set S is nonempty with finitely many 
elements, there are as many subsets with an even number of elements as 
with an odd number.” In the language of probability, if a subset is chosen at 
random from a nonempty finite set, it will have an even number of elements 
with probability 3. 

An important application of Equation 2.14 is to approximate powers of 
numbers that are near |. Thus to compute (1.03)® we may use this equation 
to find 


(1.03)8 = (14 .03)8 = 1+(5)(.03) +(5)(03)*+- ' 


= 1+ (8) (.03) + (28) (.0009) + (56) (.000027) +: - - 
= 1+ .240+ .0252+ .001512+--- 
= 1.267 (to 3 decimal places) 


This method is useful for small x, provided nx is also small. For in that case, 
the higher powers of x in Equation 2.14 rapidly become negligible, while 
the coefficients (1) (5 
x. We may also use this technique for computing powers of numbers less 
than |. Here it is useful to use a variant of Equation 2.14. If we replace x by 
—x, then we obtain 


(+ Gaya i+(T)en+(G)eat+ e+ xn 


) etc., are not large enough to overtake the powers of 


- (l—x)"=1- (i)s+ (S)e- (3)e+- “+ (-1)"%x" (2.19) 
For example, 
(.98) © = (1—.02)° = 1— (1) (02) +() 022 () (02): . 


= 1— (10) (.02) + (45) (.0004) — (120) (.000008) +- -- 
= 1— .2+.018 — .00096+ - -- 
= .817 (to 3 decimal places) 
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These examples show that Equations 2.14 and 2.19 are written with the 
most important terms first, assuming x and nx are small. In many cases, 
depending on the accuracy required, it is not necessary to write out all the 
terms. In Section | of Chapter 4 we shall see how to deal with a case such 
as (1.002)?°, in which x = .002 is small but nx = 4 is moderate in size. 

The following example gives another, somewhat surprising, application of 
the binomial coefficients. 


*28 Example 
Eight dice are tossed. If the dice are identical in appearance, how many 
different-looking (distinguishable) occurrences are there? 

What is called for is the number of unordered samples of size 8 with 
replacement from a population of size 6. An “appearance” is completely 
and uniquely specified by the number r, of 1’s, r. of 2’s, etc. Clearly r,+ 
ro+-+++re= 8, and r, = 0. We use the following trick. We think of 8 balls 
on a line and 5 dividers. In the following diagram, the balls are the O’s and 
dividers are the X’s: 

OXOOXXOOOXOXO 
1 2 0 3 1 1 


The 5 dividers break up the balls into 6 (ordered) groups, and the number in 
the first group is taken to be r,, etc. The above diagram has r, = 1, r, = 2, 
r,= 0, r= 3, r5 = 1, re= 1. Clearly, any sequence of 5 X’s and 8 O’s 
determines the numbers r,;, and conversely. But the number of such sequences 


is () = 1,287, because we must choose 5 places from among 13 to locate 


the X’s. Thus there are 1,287 different-looking throws of 8 dice. 
We generalize this result in the following theorem. 


*29 Theorem 


If r is a nonnegative integer, the number of n-tuples (r;, r2,... ,7,) (each 7; is 
an integer) satisfying the conditions 

(a) r; 2 0 (G@=1,...,n) 

(b) nytt th=r 


is 
(” +r— ' 
n—-1 
In the above example we had n= 6, r= 8. The proof of this theorem is 


the same as that given in that example, and we omit it. As in the example, we 
have the following alternative formulation. 
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*29' Theorem 


The number of unordered samples of size r, with replacement, from a popula- 
n+r— ' 
In contrast with this result, we take note that the number of ordered 
samples of size r, with replacement, from a population of size n, is simply n". 


tion of size nis ( 


EXERCISES 


1. Simplify, using Equation 2.13, 
13 13 11 11 12 
= (5) +(6) (5 )+(6)+(5) 
n n x+2 x+2 
“: (",)+(") d. r3)+ (ey) 
2. Using the Pascal property twice, express (ae as a sum involving 


terms of the form ("). (Hint: See Fig. 2.16.) 


r r+i1 r+2 


NIN 
\ 


n+2 e 
2.16 


n+2\. 
rt 5) is the 
number of subsets of size (r+2) of set S with n+2 elements. By singling 


out two elements a, and a, of S and distinguishing cases, prove! 
n+2\ __(n n n 
("7 5) 7 (") + tae , r (1 ,) 


1 It is convenient to define (") = 0 forr >n andr < 0. With this definition, Theorem 24 is 


valid, and it is not necessary to restrict the size of r in equations such as this. 


3. In analogy with the proof of Theorem 26, the number ( 
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4. Asin Exercise 3, prove that 


ra) = Cr) +9) a) + (4) 


5. Generalize Exercises 3 and 4 to prove that 


n+b n b\/ on b\/ on b\/ on 
Fro) (tetera t  #(s)(pte) 229 
(Hint: It is convenient to think of a population of n pennies and b nickels. 


How many subsets of r+ b coins are there?) 


6. Prove Equation 2.20 using the equation (1+.x)"*? = (14+x)"(1+x) 
and the binomial theorem. (Hint: Compare the coefficients of x"*® on both 
sides of this equation.) 


Cr ~ (5) + (i) + ot (") (2.21) 


as a special case of Equation 2.20. 


7. Prove that 


8. Using the binomial theorem and Table 2.12, expand each of the 
following. 
a. (at+b)° b. (c—d)? 
c. (14+2y)8 d. (2x+4y)4 
e. [x+ (y+z) ]3 


9. Write out the first 3 terms in the binomial expansion of each of the 
following: 
a. (1+.x)10° b. (x—y)*? 
c. (1—t)” d. (p+q)" 


10. Using the identity (1+ x)?"(1—x)?" = (1 — x”), prove that 


2n\? (2n\?_. (2n\? 2n\? ,(2n 
(0) <1) +) <n) = on) 
(Hint: Compare the coefficients of x2" on both sides of this equation.) 


11. Using the binomial expansion, evaluate each of the following numbers 
to 3 decimal places. 
a. (.97)? b. (1.0001) 1° 
C. (1 + 1Q~ 27) 100.000 d. ($82) 30 


12. A sample of size r is chosen from a population of size n. There are 4 
possibilities if we classify with respect to ordering and with respect to 
replacement. State the number of samples possible in each of the 4 cases. 
Give reasons or quote theorems to justify your answer. 
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*13. An experiment has 4 possible outcomes. It is to be repeated 20 


times and a frequency table is to be computed. How many possible tables 
are there? (It is not necessary to evaluate your answer.) 


*14. How many different (unordered) coin combinations are there using 
15 coins? (Only pennies, nickels, dimes, and quarters are to be used.) 


*15. How many solutions of the equation x,+.x,+x;= 10 are there? 


(Here x; is a nonnegative integer.) How many solutions are there if each x; is 
a positive integer? (Hint: Write x;= 1+ y,. Then y, is nonnegative.) How 
many solutions are there with x, = 2,x, = 1,x, 2 1? 


*16. A teacher has a class of 10 students. It is grading time and he must 


assign each student a grade of A, B, C, D, or F. 


a. In how many different ways may he grade the class? (Do not 
evaluate.) 

b. The dean requires a grade distribution. How many possible grade 
distributions are there? (A grade distribution only tells how many 
A’s, B’s, etc., are given.) 

c. Suppose the teacher insists upon giving at least 1 A, at least 3 C’s, 
and no F’s. How many different grade distributions are available to him? 


*17. Each student in a class of 10 is given an assignment to toss a die and 


record and result. The students are unreliable, and cannot be depended 
upon to carry out the assignment. A table is then prepared giving a frequency 
count for each of the 6 possible outcomes. 


a. By regarding this experiment as choosing an unordered sample of 
size 10 or less, without replacement, from a population of 6, show that 
the number of possible tables is 


15 14 13 5 
(s)+(s)+(s)+ +6) 
b. If a student does not do the experiment, he may report this fact; 
otherwise, he will report 1,2,...,6. Thus the population may be re- 


garded as one of size 7. Using this idea, how many tables are possible? 
c. Combining parts a and b prove that 


(s)= Gs) +s) ++) 


*18. Generalize part b of Exercise 17 to compute the number of solutions 


of the inequality 


rytr,+- ° “+r, =/Lr 


Here the r; are taken to be nonnegative integers. 
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*19. Using the technique of Exercise !7, prove that 


(“)+(4e Tae 4 (Ate la (abn) (@bnely (2.22) 
a a a a a+1 

Illustrate the result in Table 2.12. Write out Equation 2.22 for the special 
casesa = 0,a= 1, anda = 2. 


5 APPLICATIONS OF THE MULTIPLICATION AND DIVISION 
PRINCIPLES 


In Sections 2 and 3 we introduced the multiplication and division principles 
which led to the consideration of binomial coefficients. We now consider 
additional applications which use these principles in combination. 


*30 Example 
Ten dice are tossed. If we count the numbers 1 or 2 as low (L), 3 or 4 as 
middle (M), and 5 or 6 as high (H), what is the probability that 3 low, 3 
middle, and 4 high numbers appear? 

We note that this example is quite similar to Example 25 except that now 
we have 3 possibilities (L, M, or H) at each toss, rather than 2 (heads or 
tails). Let A be the set {L,M,H}. We choose as our sample space A X A X 
‘++ XA (10 factors). There are 3!° sample points, which we take as equally 
likely, because L, M, and H are all equally likely. To find the probability 
of 3 low, 3 middle, and 4 high, we must count the number of ordered 10- 
tuples that have 3 L’s, 3 M’s, and 4 H’s appearing somewhere in them. 


There are (3) ways to locate the L’s, and after these are placed, we have (3) 
ways of placing the M’s. The remaining 4 places are then automatically 
filled with the H’s. Thus there are (3) x (3) ways in which the result can 


occur. Using Equation 2.10, we may write 


(10) (2) AO 2 _ 10L 
3 3] B3I7!3!4!) | 313!4! 
The required probability is therefore 


10! 


3703 1314) — O71) 


The form 10!/3!3!4! is most similar in appearance to the binomial co- 
efficients n!/r\(n—r)!. The following results show how this example leads 
way to a generalization of the binomial coefficients. 
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*31 Definition 


If an ordered sample of size r from the population S = {s,,...,5,} has r, of 
its components s,,7r_ of its components s,, etc., then we say that the sample 
has type $,"1+* +S)". 

Clearly r,+-:::+r,=r and each r,; 2 0. In the above example, S = 
{L, M, H}, r= 10, and we were finding the probability that a sample was 
of type L?M?H*‘. Note that no multiplication is implied here (the “‘bases’’ s; 
are not even necessarily numbers), but the notation is suggestive. For 
example, if 5S = {H,T} (for coins), the ordered sample HHTHHTH has 
type H®T?. One simplification in notation is to omit zero exponents. Thus 
in the above dice example, we might consider the type L?H’ rather than 
L3H™M?. 

Theorem 29 or 29’ showed how many types there were. The following 
theorem shows us how many ordered samples there are of a given type. 


*32 Theorem 
Let S = {s,,...,5,}, let r; be a nonnegative integer for each i, and let r,+ 
-++-+7, =r. Then the number N of ordered r-tuples of type 5,7! ++ - 5,7" is 


N=—— (rpt:sstn=r) (2.23) 


To prove this theorem, we proceed as in Example 30. There are ( ways 


1 
bd 9 r _ r bd 9 
of locating the s,’s. Then there are ( , ' ways of locating the s,.’s, etc. Thus, 
2 


by the multiplication principle and Equation 2.10, there are 


N= ("8") (on he) 
ry lo rs r'n 


_ r! — (ro-n)! Gon 1)! 
rni(r—r,)! ri(r-n—-hr)! r, 10! 
because r—r,—:::—r, = 0. After extensive cancellations we have the 


result. (Another somewhat more direct proof is indicated in the exercises.) 

Note that the number N of Equation 2.23 is a generalization of the 
binomial coefficients. (For the binomial coefficients we have n= 2.) The 
numbers N of this equation are called multinomial (or k-nomial) coefficients, 
for reasons to be seen below. 

The binomial coefficients were used to count the number of subsets. This 
generalizes immediately to the following theorem, which is an alternative 
formulation of Theorem 32. 
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*32’ Theorem 


Let L have r elements. The number of ways L can be partitioned into n sub- 
sets L,,...,L, which exhaust L, which have no elements in common, and 
which have r,,/o,--- 1, elements, respectively (r, +--:+r, =r), is given 
by Equation 2.23. 


r 


To see this, there are (' ) ways of choosing L,, ("") ways of choosing 
2 


1 
L,, etc. Hence we have the same calculation as in Theorem 32. In that 
theorem we may regard L as the r locations in an ordered r-tuple, L, as the 
locations of the s,’s, etc. If we think of L as a population, then we may regard 
the L,’s as subpopulations. Theorem 32’ counts the number of ways of 
splitting a population into subpopulations of a different specified type. For 
example, if the teacher in Exercise 15 of Section 4 decides to assign 2 A’s, 
3 B’s, and 5 C’s to his class of 10, he has 10!/2!3!5! ways of making the grade 
assignments. We think of the class as the population, and we divide it into 
three subpopulations of the three specified types. Equivalently (if we order 
the students as in the rollbook), we are asking for grades of type A?B?C®, 
and we may apply Theorem 32. 


*33 Example 


How many words may be made out of all the letters in the word “eerier’’? 
A word is understood to be an ordered sample of type e?7*i/. Thus there 
are 6!/3!2!'1! = 60 words. 


*34 Theorem. The Multinomial Theorem 


(x, + X_ + 7. +X;,)" = »> x," -e x;,7" (2.24) 


ricco r,! 


rte tren 


In this equation the sum is extended over all k-tuples (r,,...,1r;,) in which 
r; 1S a nonnegative integer, andr, +r. +: ::+r, =n. 
The proof proceeds by considering the product: 


(xp teste) tye hag) oe Ot tate) 
ne 


n factors 


To find this product, we multiply any one of the summands ,x; in the first 
factor by any summand in the second factor, etc., and add all the possibilities. 
By Theorem 32, a factor x,"---x,"* occurs in exactly n!/r,!---7r,! ways. 
This is the result. 


*35 Example 


Eighteen dice are thrown. What is the probability that each number shows 
up 3 times? 
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If we let § = {5,,...,5,} be the sample space for one die, then we choose 
S18 (=S§ xX §xX---xXS for 18 factors) as the required sample space. As usual 
we assume that S?° is uniform. We are looking for the probability that a 
sample point is of type s,°s,°- ++ s¢°. The number of such sample points is 
18!/(3!)®, by Theorem 32. The total number of sample points is 68. There- 
fore, the required probability is 


18! 1 _ 18! 
(38 618 Gt 

The numbers in this calculation are so large that one might well ask how 
the answer is evaluated. One way is to use a table of logarithms. Most large 
handbooks of tables include a table of logarithms of factorials. Thus the 
logarithm of 18! can be read off directly, and the calculation to 3 decimal 
places is routine if one knows how to use logarithms. 

Most people expect an answer much larger than .00137. This is why 
gamblers make money off the unsuspecting. 

The following example generalizes Example 15. 


(= .00135) 


36 Example 

An urn contains 90 black balls and 40 red balls. Ten balls are selected at 
random (without replacement). What is the probability that 7 black and 3 
red balls are chosen? 

It is most convenient to use unordered samples. (Compare with Example 
15 in which ordered samples were used.) We take as the sample space all 
unordered samples of 10 balls from among the 130. We regard this as a uni- 
form space, because all possibilities seem equally likely. Thus there are 


130 
10 outcomes. 
We now compute the “favorable” outcomes. A favorable outcome is any 


set consisting of 7 black balls and 3 red balls. There are (7) such sets of 


90 


black balls and (4 7 


3 
comes. Finally, the required probability is 


_ [90\/40 130\ ,_ 
p= (7)(3)/ (a0) 277 
The best way to compute this answer is to use a table of logarithms and 


a table of logarithms of factorials. Thus, using Equation 2.11, we may ex- 
press the answer in terms of factorials: 


_ 90! 40! 101120! 
P~ 71831 3137! +130! 


sets of red balls. Hence there are ( (9) favorable out- 
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Then, by referring to a table of logarithms of factorials, this product can 
be easily evaluated. Incidentally, the numerator in the above expression for 
p is about 2.94 x 10°*!. The answer, .277, is tame enough. 


EXERCISES 


*1. An urn has 3 balls in it, colored azure, beige, and chartreuse. Nine 
balls are chosen at random (with replacement) from this urn. What is the 
probability that 2 azure, 3 beige, and 4 chartreuse balls are chosen? 


*2. In Exercise | find the probability that when 9 balls are chosen at 
random, 2 of one color, 3 of another, and 4 of the third color are chosen. 
Which is more likely —a 2-3-4 split or a 3-3-3 split? How much more likely 
is one than the other? (Hint: Divide your two answers to find out how much 
more likely one is than the other.) 


*3. (Alternative proof of Theorem 32'.) If L is the set of r numbers 
{1,2,...,r}, then we may form a partition into subpopulations L,, L,,..., 
L,, (of size ry,...,7r,) by taking any arrangement of L, and then letting the 
first r, elements be the elements of L,, the second r, elements the elements of 
[,, etc. Using this idea and the division principle, prove Theorem 32’. 


*4, Fifty people each shuffle a deck of cards and observe the top card. 
What is the probability that 12 hearts, 12 clubs, 13 diamonds, and 13 spades 
are observed? Find the probability that exactly 12 hearts and 12 clubs are 
observed. Find the probability that 25 black cards and 25 red cards are 
observed. (Do not evaluate.) 


*5. A laboratory technician has 16 hamsters with colds. He wishes to 
test the effectiveness of the brands W, X, Y, and Z of cold remedies. He 
therefore separates them, at random, into 4 groups of 4 each, assigning the 
groups to the different remedies. It turns out, unknown to the technician, 
that 4 of the hamsters have colds that can be cured by any of the brands, 
while 12 have incurable colds. (The brands are equally effective.) 

a. What is the probability that the laboratory technician will discover 
that brand X cures all its hamsters, causing the laboratory to issue an 
erroneous report concerning the effectiveness of brand X ? 

b. What is the probability that brand X will cure 2 hamsters, while Y 
and Z cure | apiece? 

c. What is the probability that each brand cures | hamster? 


*6. A bag of poker chips contains 7 white chips, 8 red chips, and 14 blue 
chips. Five chips are drawn at random. Find the probability that 
a. 1 white, 1 red, and 3 blue chips are drawn. 
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b. no whites are drawn. 
c. 2 red and 3 blues are drawn. 


*7, How many “words” may be made out of all the letters of the word 
‘‘macadem’’? 


*8. How many 7-digit numbers are there which use all the digits 2, 2, 2, 
3,3,5,5? 


*9, How many terms are there in the multinomial expansion (Equation 
2.24)? 


*10. Evaluate > 6'/a'b'c'!, where the sum is taken over all nonnegative 
integers (a,b,c) such that a+b+c= 6. (Hint: Use Equation 2.24. Take 
each x; = 1.) 


*11. A class of 30 is divided into 3 groups, A, B, and C, at random. (Thus 
each student in this class 1s assigned the grade A, B, or C at random.) Is it 
more likely that C has an odd or an even number of students? (Hint: Com- 
pute (1+1+—1)*° using Equation 2.24.] Generalize this problem, still 
using 3 classes, but making the number in the class arbitrary. Generalize, 
making the number of classes arbitrary. (Cf. Equation 2.18.) 


*12a, Find the term involving x?y’z* in the expansion of (x+y+z)?%. 
b. Find the term involving x’y* in the expansion of (1+x+y+z)’. 


13. If there are 80 good transistors and 20 bad transistors in a box, and if 7 
transistors are chosen (without replacement) at random, what is the prob- 
ability that 
a. all transistors are good? 
b. 5 are good and 2 are bad? 
c. 5 or more are good? 

Do not evaluate. 


14. Ten cards are chosen, without replacement, from a standard deck. 
Find the probability that 5 are red and 5 are black. 


*6 CARDS 


In this section we shall consider poker and bridge. It is necessary to start 
with a warning. As with most interesting games, these games are more than 
probability experiments. They are games of skill involving intricate rules. 
It is not enough to know the probabilities of various events. It is also re- 
quired to know how to play. But these probabilities provide a starting point 
for gaining some insight into these games. 
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37 Poker 


The first step in many poker games Is to be dealt 5 concealed cards. For the 
purposes of poker, an ace is regarded as a 1 or 14, and a jack, queen, and 
king are regarded as an 11, 12, and 13, respectively. The hands are classified 
according to the following scheme (listed in order of decreasing value): 


a. Royal flush: 10, J, Q, K, A of the same suit. 

b. Straight flush: 5 consecutive ranks in the same suit, except case a, 
which Is put into a special category. 

c. Four of a kind: 4 cards of the same rank. 

d. Full house: 3 cards of one rank, 2 of another. 

e. Flush: 5 cards in the same suit, with the exception of case a or b. 

f. Straight: 5 consecutive ranks, not all of the same suit. 

g. Three of a kind: 3 cards of the same rank, the other 2 ranks different 
from each other and different from this rank. 

h. Two pair: 2 cards of the same rank, 2 cards of a different rank, and 1 
card of a rank different from these ranks. 

i. One pair: 2 cards of the same rank, the other 3 ranks different from 
each other and this rank. 

j. A bust: none of the above. 


To compute the probabilities, we take as a (uniform) sample space all 
unordered samples of size 5 (without replacement) from the deck of 52 cards 
(the population). There are (") = 2,598,960 possible hands. By computing 
the number of hands of type a, b,...,h, we may find the probability of each 
such hand. 

For example, how many hands of type h (2 pair) are there? We may 
specify this hand using the following stages (framed as questions): 1. Which 
2 ranks constitute the pair? 2. What are the suits of the lower pair? 3. What 
are the suits of the higher pair? 4. What is the odd card? Thus the hand 
{3 He, 3 Cl, 5 Di, 5 Cl, J Sp} is specified by the stage as follows: (1) {3, 5}, 
(2) {He, Cl}, (3) {Di, Cl}, (4) J Sp. The computation is as follows: 


4 


13X12.4X3_4X3_ | 
7 XB Xa 44 = 123,552 


Thus there are 123,552 possible ‘‘2-pair” poker hands. The probability is 


123,552 __ _ 
p(2 pair) = 7 598,960 > .048 = 4.8% 


Thus one picks up 2 pair roughly 1 time in 20. 


Stage: 
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The computation of the other probabilities follows a similar pattern and 
is left to the exercises. Note that because unordered samples were con- 
sidered, it was unnecessary to specify the order in which the cards were 
drawn. (Compare with Exercise 9, Section 2 of Chapter 2.) The probabilities 
satisfy a property we all suspected to be true. The higher the value of a 
hand, the less likely it is! 

Computations such as these give us indirect experimental evidence that 
we are doing more than playing with numbers when probabilities are com- 
puted. They can be experimentally verified. Each student in the author’s 
class of 31 was asked to deal himself a poker hand 5 times (shuffling well at 
each stage). The resulting figures are given in Table 2.17, along with the 


2.17. Poker Experiment, with Probabilities 


Relative 
Frequency | frequency (percent) | Probability (percent) 


A bust 46.7 50.1 
One pair 44,2 42.3 
Three of a kind 2.1 
Full house 1 
Other 0 0 0 

Total 165 99.9 100.0 


theoretical probabilities (to the nearest tenth of 1 percent). 
We note that in a game with 4 players, the possible initial deals are much 


larger than (‘). In fact, we have a partition of the population of 52 cards 


into subpopulations of size 5, 5, 5, 5, and 32 (the remaining cards). There 
are 
52! 
(5!)432! 


possible initial deals in a 4-handed poker game. 


= 1.48 x 10*4 
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38 Bridge 

Bridge is played with 4 people, North (N), East (E), South (S), and West 
(W). Each person is dealt 13 cards at random, and the game begins. We con- 
cern ourselves with initial probabilities and leave strategy to the experts. 

A bridge deal is a division of a population of 52 cards into 4 subpopula- 
tions, each of size 13. Thus there are 52!/(13!)* = 5.36 x 1078 possible deals. 
Occasionally, one reads that 4 people were playing bridge and that each 
player was dealt 13 cards of the same suit (a so-called perfect hand). What 
is the probability? The suits may be distributed in 4! = 24 different ways. 
Hence the probability of such a distribution is 24/(5.37 X 1078) = 4.47 x 1078. 
This is an unlikely event. 

On the other hand, what is the probability that South picks up a perfect 


hand? The simplest sample space 1s the (13) = 6.35 X 10" (unordered) hands 
consisting of 13 cards. Of these, there are 4 perfect hands (1 for each suit). 
The probability is 4/ (73) = 6.3 X 107!*. This is also unlikely but about 10% 


times more likely than the event of all players receiving such a hand. 
Getting back to reality, what is the probability that a person picks up a 
4-3-3-3? distribution? To count the possibilities, there are 4 choices for the 


oo ; -o, 13 ; 
suit with 4 cards, and, given this suit, there are ( 4) ways of choosing the 
. aes 13 ; ; 
cards in that suit. Similarly, there are ( 3 ) ways of choosing the cards 1n the 


3 
other 3 suits. Thus there are (4 (5) ways of forming a 4-4-4-3 distribu- 


tion. Finally, the probability of this (so-called ““square’’) distribution is 
13\/13\3 
(4 )(5) 
(3) 
13 
On the other hand, similar reasoning gives 
2 
4 -9(5 (5) (2) 
p(5-3-3-2) = RR) 
13 
Dividing these 2 probabilities, we have, after simplification, 
p(5-3-3-2) _ 81 


p(4-3-3-3) 55 4 


p(4-3-3-3) = (= .1054) 


2 The numbers refer to the number in each suit. However, the order of the suits are not speci- 
fied. The 4 may refer to any of the suits. 
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Thus a 5-3-3-2 distribution is about 1} times as likely as the 4-3-3-3 distribu- 
tion. 


*EXERCISES 


In the following exercises, evaluate the answers only where feasible —i.e., 
where the numbers are relatively small, or where cancellations simplify a 
problem, or where curiosity and (if necessary) a table of logarithms are at 
hand. 


1. Find the number of possibilities for each of the poker hands a to j of the 
text, and compute the probabilities. 


2. A poker player picks up a hand with 2 kings and 3 other cards (not 
kings and with different ranks). The rules permit him to exchange these 
cards for 3 others in the deck. He makes this exchange. What is the prob- 
ability that he improves his hand? 


3. As in Exercise 2, a poker player picks up 3 spades (3, 7, J) and 2 cards 
of another suit. He exchanges these 2 cards for 2 other cards in the deck. 
What is the probability that he draws a flush in spades? What is the prob- 
ability that he gets 1 pair or better? (You may assume that he did not 
exchange a 3, 7, or J.) 


4. Some people play poker with deuces wild. The means that a deuce (2) 
may be regarded, at the player’s option, as any card (rank and suit arbitrary) 
that the player specifies. In this case 5 of a kind is a possible poker hand. Ina 
deuces-wild game, which is more likely —a royal flush or 5 of a kind? 


5. “Poker” may be played with S dice. Compute the probability of the 
analogues of 


a. 5 of akind. b. 4 of a kind. 
c. afull house. d. astraight. 
e. 3 of akind. f. 2 pair. 

g. I pair. h. abust. 


Arrange in order of probability — the least likely first. 
6. In bridge, find the probability that a player does not pick up an ace. 
7. In bridge, find the probability that all 4 players have aces. 


8. In bridge, find the probability that a player picks up 11 or more cards 
in the same suit. 


9, In 7-card stud a player is dealt 7 cards and he chooses the 5 cards that 
give him the best hand. What is the probability of a royal flush? A straight 
flush? A flush? A full house? Two pair? 


CHAPTER 3 GENERAL 
THEORY 
OF 
FINITE 
PROBABILITY 
SPACES 


INTRODUCTION 


If § is any probability space (Definition 1.6), we have defined an event A as 
any subset of S$ (Definition 1.12). In Chapter 2 we worked with a uniform 
space § and considered the problem of computing the probability p(A) by 
counting the elements in A and in S and using Equation 1.11: p(4) =n(A)/ 
n(S). 

In this chapter we shall learn some of the techniques for computing 
probabilities of events in an arbitrary finite probability space. It is well to 
keep in mind the purpose of many of the theorems we study. Simply put, we 
compute probabilities of complicated events by using the probabilities of the 
simpler events as a basis. Similarly, we build complicated sample spaces 
(these are the ones in real life) out of building blocks consisting of simple 
sample spaces. This chapter may well be subtitled ““New Probabilities from 
Old.” 
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1 UNIONS, INTERSECTIONS, AND COMPLEMENTS 


In what follows, we shall assume that we have a fixed sample space 
S = {s,,...,5,}, with p(s;) = p;. All events, or sets, are understood to be 
subsets of S. Much of what is stated applies to infinite spaces as well. 

If A and B are sets, there are two fundamental operations that we can 
perform on A and B and one operation on A to form a new set. 


1 Definition 
If A and B are sets, the union of A and B, denoted A U B, 1s the set of points 
belonging to A or to B or to both A and B. The intersection of A and B, 
denoted A M B, is the set of points belonging to both A and to B. The 
complement of A, denoted A, is the set of points not belonging to A (but 
belonging to S). 

Figure 3.1 illustrates these definitions. 


A (1) B shaded A U B shaded A shaded 


3.1 AN B,AUB,A 


2 Example 
In the game of black jack the dealer is dealt 2 cards, | concealed and | 
exposed. Let S be the set of all possible hands. Thus S is the set of ordered 
samples (size 2) from the population of 52 cards. Suppose we define A to be 
the event “The hidden card is an ace.”’ This means, in sample-space termino- 
logy, that A is the set of all hands in which the hidden card is anace. Suppose 
B is the event “The exposed card is a 10, J, Q, or K.”’ Then we may define 
several events in terms of A and B. For example, A  B is the event “‘The 
hidden card is an ace, and the exposed card is a 10, J, Q, or K.” A N Bis 
“The hidden card is not an ace, but the exposed card is a 10, J, Q, or K.” 
A U B is “The hidden card is an ace, or the exposed card is a 10, J, Q, or 
K.”’ B is ‘The exposed card is not a 10, J, Q, or K.” 

Some events cannot be expressed in terms of A and B by using intersec- 
tions, unions, or complements. Thus “The exposed card is an ace” is such 
an event. 
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A simple rule of thumb determines whether unions, intersections, or 
complements are involved: 

The word “and” connecting two events usually suggests an intersection. 

The word “or’’ suggests a union. (We always understand the word “‘or’’ to 
be the same as the term “‘and/or.’’ Thus when we say that the car is red or the 
chair is blue, we include the case that both the car is red and the chair is 
blue. This is the universal meaning of “or” in mathematics, and it is the usual 
meaning in the English language.) 

The word “not” suggests a complement. 

For this reason a perfectly plausible terminology is to use “‘A and B” for 
“4 20 B,” “A or B” for “A U B,” and “not A” for “A.” However, the nota- 
tion we use is now rather standard. 

The following results are useful for the computation of the probabilities of 
events that are built up from other more simpler events. 


3 Definition 
The events A and B are said to be mutually exclusive if A N B=@. 

Thus A and B are mutually exclusive if they cannot occur simultaneously. 
For example, if § is a pack of cards, A = {red cards} and B = {spade picture 
cards}, then A and B are mutually exclusive. 


4 Theorem. On Mutually Exclusive Events 


p(A U B) =p(A)+p(B) if A and B are mutually exclusive (3.1) 


5 Theorem. On Complements 
p(A) = 1—p(A) (3.2) 


To prove Theorem 4 we go to the definition of the probability of an event 
(Definition 12 of Chapter 1). Writing A = {a,,...,a,}, B= {b,,...,b,}, we 
have A U B= {a,,...,ds, b;,...,b,}. In the latter set the a;’s and b,’s are 
all distinct, because A and B are mutually exclusive. By Definition 12 of 
Chapter 1 we have 


P(A U B) =p(a,)+-+--+p(a,)+p(b,) +°: -+ p(b;) 


= p(A)+p(B) 

This proves Theorem 4. 

To prove Theorem 5 we note that 4 and A are mutually exclusive and that 
A U A=S. (This is an immediate consequence of the definition of A.) Thus 
p(S) = p(A)+p(A) by Theorem 4. But, by Equation 1.13, p(S) = 1. Thus we 
obtain - 

p(A)+p(A) =1 (3.3) 

which is equivalent to Equation 3.2. 
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Equation 3.2 may be generalized. We say that A C B if all elements of A 
are also in B. In this case we let B—A designate the set of elements that 
are in B but notin A. (B—A =B N A.) Then we have 


p(B—A)=p(B)—p(A) (ACB) (3.4) 


To see this, we note that B = A U (B—A) and that A and B—A are mutually 
exclusive. (Thus B is the union of A and the set of points of B that are not in 
A.) Hence p(B)= p((A) U (B—A)) = p(A)+p(B—A) by Equation 3.1. 
Transposing p(A4) we obtain Equation 3.4. Equation 3.2 is the special case 
B=S. 

Finally, we may generalize Theorem 4 to arbitrary sets A and B. 


6 Theorem 
For any events A and B, 


p(A U B)=p(A)+p(B)—p(A n B) (3.5) 


To prove this we note thatA M B C B.Furthermore,A U B=A U (B— 
A 1 B). (Thus, to form A U B we adjoin to A the points of B that are not 
already in A.) Hence we have, by Equations 3.1 and 3.4, 


D(A U B)=p(A U (B-A N B)) 
= p(A)+p(B-A N B) 
= p(A)+p(B)—p(A 2 B) 


We may regard Equation 3.5 as follows. To form the probability of A U B 
we add the probability of A to the probability of B. But certain sample 
points—those which belong to A and B—are counted twice. Hence it is 
necessary to subtract p(A M B). 


7 Example 

In a certain class, 35 percent of the students are hard working, 50 percent 
are smart, and 10 percent are hard working and smart. What percentage is 
either hard working or smart? What percentage is neither hard working 
nor smart? 

If H is the set of hard-working students and’S is the set of smart students, 
we have p(f7) = .35, p(S) = .S0, p(H NM S)=.10. Thus p(H U S)=p(A#)+ 
P(S)— p(A ON S)= .50+.35—.10 = .75. Thus 75 percent are in the smart 
or hard-working category. The complement of this group is the group of 
students that are neither smart nor hard working. Thus 


p(H ON S)=p(H U S)=1—-p(A U S) = 1—.75 = .25 


This example illustrates the following general properties of sets, known 
aS De Morgan’s laws: 


AUB=ANB ANB=AUB (3.6) 
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By drawing a diagram similar to Fig. 1.1, the reader may readily convince 
himself of the truth of Equations 3.6. 

Example 7 is not a traditional “probability problem.” We are using the 
idea, given in Section 5 of Chapter 1, that a finite set may be regarded as a 
uniform sample space. 

It is possible to generalize the above considerations to more than two sets. 


8 Definition 
If A,,...,A, are sets, the union A, U-::+:U A, is defined as the set of 
points belonging to at least one A;. The intersection A, 1 -::: 1 A, is the 
set of points belonging to all the 4,’s. 

Definition 1 is the special case s= 2. We may now easily generalize 
Theorem 4. 


9 Definition 
The sets A,,...,A, are mutually exclusive if any 2 of these sets are mutually 
exclusive: A; N A;=9@ fori # j. 


10 Theorem. On Mutually Exclusive Events 


p(A, Uses U A;) = p(A,) +: ° ‘+ p(A;) 
if the sets A; are mutually exclusive (3.7) 


The proof is a straightforward extension of the proof of Theorem 4, and 
we omit it. 


11 Example 
Twenty coins are tossed. What is the probability that 17 or fewer heads 
turn up? 

As in Example 2.25, the probability that exactly r heads turn up is 


(P28. If E, is the event that exactly r heads occur, we are looking for the 


probability of the event 
E=E, U E, U-s- U | ee 


However, the numbers are smaller and fewer if E is considered. E is the 
event that 18 or more heads occur. Since the sets E, are clearly mutually 
exclusive, we may use Equation 3.7 to obtain 


P(E) = p(E,g) + p(E,9) + p (Exo) 


= (8) (8) 
= (190+ 20+ 1)/2%° 
= 211/22 = 00020) 


Finally, p(E) = 1— p(E) = .99980 (to 5 decimal places). 


86 ELEMENTARY PROBABILITY THEORY 


Equations 3.1 through 3.5 have their counterparts in counting finite sets. 
Using Equations 1.14 and 1.15 we can convert any statement about prob- 
abilities into one about numbers. All that is required is the number of 
elements in the set, which we regard as a uniform sample space. Thus we 
have the following theorem. 


12 Theorem 
Let S be a set with k elements, and let 4 and B be subsets of §. Then 


n(A U B)=n(A)+n(B) if A and B are mutually exclusive (3.8) 


n(A) =k—n(A) (3.9) 
n(B—A) =n(B)—n(A) ifA CB (3.10) 

and 
n(A U B)=n(A)+n(B)—n(A ON B) (3.11) 


For a proof, it is sufficient to multiply Equations 3.1, 3.2, 3.4, and 3.5 by 
k, and to use Equation 1.15. 

Equation 3.8 is so basic that it is often used to define the sum of 2 numbers 
in a theoretical formulation of arithmetic. We used it in Example 1.14, 
where A and B were called the “cases.” Equation 3.11, a generalization, 
even permits the “‘cases”’ to overlap. 


13. Example 

A player draws 3 cards (without replacement) from a deck. He wins if they 
are all the same color or if they form a straight (3 consecutive ranks). What is 
his probability of winning? 

We use the uniform probability space of unordered triples. There are 
k= (*) = 52-51 - 50/3: 2-1 = 22,100 elementary events. We let C be the 
set of hands with cards of the same color and S the set of hands that form 
straights. (As usual, we interpret an ace as a | ora 14.) If W is the set of 
winning hands, W = § U C and 


n(W) = n(S)+n(C)—n(S N C) (3.12) 


A straight is determined by the starting point and then the suits for the 3 
cards. Since we may start a straight anywhere from A through Q (1 through 
12), we have n(S) = 12 X 44 = 768. A hand in C is specified by the color 
(red or black) and then a selection of 3 cards of that color. Hence n(C) = 
2X (7?) = 5,200. Finally, a hand of S M C 1s specified by (1) the color, (2) 
the beginning rank, and (3) the 3 suits (in the color selected) from high to low. 
Hence n(§ 9 C) =2X 12 X 23 = 192. Thus, by Equation 3.12, we have 


n(W) = 768 + 5,200 — 192 = 5,776 
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and finally 
_n(W) _ 5,776 _ 
P(W) ="05) = 22.100 (= .261) 


EXERCISES 


1. Let S be the integers from | through 10 inclusive. Let A be the odd 
integer of S, let B be the primes {2,3,5,7} of S, and let C be the numbers 
from 3 through 8. Find 


a. AUB b. B 

cANc da ANBNC 
eANBNcC fi. ANBNC 
gAUBUC h AUBUC 


2. In Exercise 1 verify Equation 3.11 for the sets A and B, for B and C, 
and for A and C. 


3. Let S be the set of integers (positive, negative, or zero), P the set of 
positive integers, E the set of even integers, JT the multiples of 3, and Sq 
the set {0, 1,4, 9,...} of squares. Express each of the following, if possible, 
using unions, intersections, or complements. (Warning: Some may not be 
possible.) 


a. The odd integers. b. The positive even integer. 
c. The odd squares. d. All multiples of six. 

e. The nonnegative integers. f. All squares larger than 2. 
g. {0}. h. {0, 1}. 


4. Suppose 4 A B is defined to be the set of elements in 4 or in B but 
not in both. Derive a formula for p(A A B). 


5. Let S be the sample space given in the following table: 


LetA = {a,b,c}, B= {a,c,e,f},C ={c,d,e,f, g}. 
a. FindA,B,C,A U B,BNC,ANBOC,A UB. 
. Find the probabilities of the events in part a. 
. Verify Theorem 6 for p(A U C). 
. Verify Equation 3.6 for B U CandforA  B. 


6. There are 20 apples in a bag which are either red or rotten. There are 


af & 
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12 red apples and 13 rotten ones. How many red, rotten apples are there? 
State specifically which formula you are using, and identify your sets. 


7. It is observed that 7 percent of the words on a certain page begin with 
the letter ‘‘e’’ and 6 percent end with ‘‘e.” Suppose that 1 percent begin and 
end with “e.”” What percentage begins or ends with “‘e’’? 


8. Two investigators are given a very difficult code to crack. It is known 
from past experience that each investigator has probability .18 of cracking 
such a code. What can you say about the probability that the code will be 
cracked? Explain. 


9. Three dice are tossed. What is the probability that at least one 6 
occurs? Similarly, find the probability of at least one 6 when 4 dice are 
tossed. (Hint: Look at the complement.) 


10. Three dice are tossed. What is the probability that some number will 
be repeated? Do the same for 4 dice. 


11. An integer from 1 to 100 inclusive is selected at random. What is the 
probability that it is either a square or a multiple of 3? 


12. Ten coins are tossed. What is the probability that the number of heads 
is between 3 and 7 inclusive? (You may use Table 2.13.) 


13. How many integers between 1 and 600 inclusive are divisible by 
either 3 or 5? 


14. An urn contains 7 black, 8 red, and 9 green marbles. Five marbles are 
selected (without replacement) at random. What is the probability that some 
color will be missing? (Do not evaluate.) 


15a. Two cards are chosen from a deck. What is the probability that 
at least 1 ace is chosen or that both have the same color? 
b. If 3 cards are chosen, what is the probability that an ace is chosen or 
that the cards have the same color? 


16. A packet of seeds contains 100 seeds, of which 85 will germinate. Ten 
seeds are chosen at random and planted. What is the probability that 7 or 
more will germinate? (Do not evaluate.) 


17. If p(A) and p(B) are known, then the value of p(AU B) cannot be 
exactly determined, because Equation 3.5 involves the unknown value of 
p(A  B). In each of the following, determine how large and how small 
p(A U B) canbe. 

a. p(A) =.1, p(B) = .2 b. p(A) =.5, p(B) = .4 
c. p(A) = .6, p(B) = .7 d. p(A) = .2, p(B) = .9 
e. D(A) = .9, p(B) = .95 
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18. Give a rule that determines how large and how small p(A U B) can 
be in terms of the values of p(A) and p(B). 


19. In Exercise 17 determine how large and how small p(A M B) can be. 


20. In analogy with Exercise 18, estimate the size of p(A M B) in terms 
of the size of p(A) and p(B). 


2 CONDITIONAL PROBABILITY 


Table 3.2 is adopted from government sources! and gives data on unemploy- 
ment by sex and age. We may regard Table 3.2 as giving the results of an 
“experiment” that had 8 possible outcomes. 

If we wish to analyze the unemployed males by age, we may use the 
appropriate entries in Table 3.2 to construct a new table. In this case we 
restrict the sample space to the event “male” and we arrive at Table 3.3. 


3.3 Unemployed Males, Age 20 Years and Over, by Age and 
Sex, 1960 


369 


Number (in 
thousands) 


Percentage 


When the outcomes of an experiment are restricted in this manner to a 
specified event, we speak of the conditional relative frequency of an outcome 
or an event. We have used the notation f(A ) or f(s) for the relative frequency 
of an event or outcome. We shall use the notation f(A|B) (read: f of A, 
given B) to designate the relative frequency of the event A when the event 
B is regarded as the total sample space. For example, to compute f(25-44| 
male) we read, from Table 3.2 or 3.3, that 907,000 males are in the age 
bracket 25-44, and that there are 2,058,000 males. Thus f(25-44|male) = 
907/2,058 = .441, as indicated in Table 3.3. Note that the numerator 907 
was not the total in the 25-44 category —it was the total in the (25-44) N 
male category! We now generalize this procedure. 


1 U.S. Bureau of the Census, Statistical Abstract of the United States: 1962 (eighty-third 
edition), Government Printing Office, Washington, D.C., 1962. 
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14 Definition 
Suppose an. experiment has outcomes s,,..., 5, and that A and B are events. 
The relative frequency of A, given B, is defined by the equation 


n(A 1 B) 
n(B) 


Here we again use the notation n(B) to denote the number of occurrences 
(the frequency) of B. Again, the numerator is not n(A), but n(4 OM B). 
Equation 3.13 is not defined if n(B) =0. Nor do we wish to define the 
relative frequency of an event A given an event B that has not occurred. 

To illustrate Definition 14 again, with a different base population B, let us 
find the percentage of males among the unemployed in the age category 
20-44. Referring to Table 3.2, we find n(20-44) = 369 + 214+ 907 + 516 = 
2,006, while n(males M (20-44)) = 369+ 907 = 1,276. Thus /f(males|(20- 
44)) = n(males NM (20-44))/n(20-44) = 1,276/2,006 = 63.6 percent. Among 
the population of unemployed in the age group 20-44, 63.6 percent are males. 

Just as the relative frequency of an event may be computed from the 
relative frequencies of the outcomes in it (Theorem 1.10), we can compute 
the conditional relative frequency f(A|B) directly from relative frequencies. 


f(A|B) = (3.13) 


15 Theorem 
Let A and B be events in some experiment with the relative frequency 
f(B) # 0. Then 

f(A M B) 


A\B) =-——.——— 3.14 
flA|B) =“ 3.14) 
Proof. \f a total of N experiments are performed, then, by Definition 1.9, 
_ n(B) _n(A 1 B) 
f(B) = a and f(A N B)= as 


Dividing these equations, we obtain, using Equation 3.13, 


f(A NB) n(AN B)IN_n(A NB) _ 
f(B) ~  nBIN By 1418) 
This is the result. 

To illustrate this theorem, let us compute f(males|(20-44)) once again. 
We have, using Table 3.2, (20-44) = .118+.068+ .289+ .165 = .640, 
f(males MN (20-44)) = .118+ .289 = .407. Hence, by Equation 3.14, ftmales| 
(20-44)) = .407/.640 = 63.6 percent, as before. 

Since probabilities may be closely approximated by relative frequencies if 
a large number of trials are performed, Theorem 15 suggests that the 
probability of A, given B, be defined. 
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16 Definition 


Let S be a probability space, and let A and B be events of S with p(B) # 0. 
Then the probability of A, given B, or the conditional probability of A, 
given B, 1s defined by the formula 


_ p(A OB) 
p(A|B) >(B) (3.15) 

In the light of Equation 3.14, we see that the conditional probability 
p(A|B) is approximated by the conditional relative frequency f(A|B) if a 
large number of trials are performed. We might also note, as indicated in 
Section 5 of Chapter 1, that any table of frequencies may be viewed as a 
probability space. From this point of view, Equation 3.14 is a special case 
of Equation 3.15. 

We have pointed out that behind any probability there is, in theory, an 
experiment. It is well to state the experiment corresponding to p(A|B). 

If A and B are events in an experiment with sample space S, and if 
p(B) #0, then to find an experimental value for p(A/B), we repeat the 
experiment many times, ignoring all outcomes except those in B, and find the 
relative frequency of the outcomes in B that are also in A. 


17 Example 


A die is tossed repeatedly. What is the (conditional) probability that 6 
turns up on or before the fourth toss, given that it turns up somewhere in 
the first 10 tosses? 

We refer to Table 1.6 and take for granted the probabilities in that table. 
Let A be the event “4 or less throws required” and let B be the event “10 or 
less throws required.” We wish to find p(A|B). Here A N B= A, because 
any occurrence of A implies an occurrence of B. Thus p(A|B) = p(A)/ 
p(B). From Table 1.6 we read p(A) = .167+ .139+ .116+ .096 = .518 and 
p(B) = 1— (.027+ .022+.112) = 1—.161 = .839. [Here it is easier to com- 
pute p(B) and then p(B).] Thus p(A|B) = .518/.839 = .617. 

The (unconditional or absolute) probability of tossing a 6 in 4 or less trials 
is p(A) = .518. We see that, in this case, it was intuitively clear that p(A|B) 
was larger than p(A), because if we failed to toss a 6 in the first 4 attempts, 
it was still possible to avoid a 6 in the next 6 attempts, so the failure would 
be “washed out.” 

In this example p(A|B) = p(A)/p(B), because A was included in B. (If A 
occurred, B occurred automatically.) For later reference we record this 
special case of Equation 3.15: 


p(A|B) =o ifA CB (3.16) 


If B= S (the entire sample space), we automatically have A C S. Also, 
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p(S) = 1. Thus 
p(A|S) = p(A) (3.17) 


Thus any probability may be regarded as a conditional probability, given S. 


18 Example 
Two cards are chosen from a deck. At least 1 of the cards is red. What is the 
probability that both are red? 

We let A be the event that both are red, and we let B be the event that at 
least 1 is red. Again A C B. We use ordered samples to compute the prob- 
abilities. p(A) = 26 - 25/52 - 51 = 4%. B is the event that both are black. Thus 

p(B) = 25, as with p(A), and hence p(B) = 1-3 = 4. Finally, using 
Equation 3.16, we have p(A|B) = 33/43 = 2 = .325. 

Another way of obtaining this answer is as follows. There are 52-51 
possible hands. Of these, 26-25 hands are all black. Hence there are 
52-51—26-25 =n(B) hands with at least 1 red card, and all of these are 
equally likely. Of these hands, there are 26 : 25 = n(A) all-red hands. Thus 

26: 25 25. 25 


P(A|B) = 3951-96-95 ~ 102-05 7 


This technique may be used for uniform probability spaces. 


19 Theorem 
If S is a finite uniform probability space and if B ¥ 9, then 
n(A O B) 
A\|B) =——.+ 3.18 
p(A|B) = n(B) (3.18) 


The proof follows directly from Equation 3.15, using Formula 1.11, 
which is valid for uniform spaces. The details are left to the reader. 


20 Example 
Two cards are chosen from a deck. The first is red. What is the probability 
that the second is also red? 

Since the problem is phrased in terms of first and second cards, ordered 
samples are called for. We let R, be the event “first card is red” and we let 
R, be the event “second card is red.”’ Then it is required to find 


=P M ah 


We have p(R, NM R,) = 26: 25/52 - si. while niR) = 26: §1/52 - 51. Thus 
26:25 /26:51_ 26°25 52-51 25 


P(RIRi) = 5551/59. 51 ~ 52°51 26°51 ST 
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Several interesting questions arise in connection with this problem. First, 
the answer 2 is obvious. Clearly if the first card was red, the probability 
that the second card is red is #3, because there are 25 red cards among the 51 
(equally likely) cards left over, regardless of what the first red card is. We 
shall justify this technique in Section 3.5, but for the present we note that 
the obvious answer = was verified in Example 20. In what follows we shall 
use the ‘“obvious”’ answer without further justification. 

Similarly, another “obvious” computation above was p(R,) = 4. Indeed, 
of the 52 (equally likely) cards in the deck, there were 26 red cards. Yet our 
computation used ordered couples: There were 26 red cards (first stage), 
then 51 cards (second stage), etc. Why do it this way? We have stressed the 
importance of the probability space throughout this text, and it is crucial 
that we understand that all events are taken to be subsets of a given prob- 
ability space. The underlying reason that the simple computation [p(R,) = 
26] works is our choice of a uniform probability space for the possible hands. 
We shall also clarify this in Section 5, but we shall henceforth use the simpler 
method, when applicable. 

Many people who write p(R,) =% without thought balk at the equation 
p(R,) =%. The common attitude is that p(R,) ““depends on what color the 
first card is.’ However, this is needless balking, because the same line of 
reasoning would show that p(R,) depends similarly on what color the second 
card is! The situations are, in fact, identical except for a time sequence. 
Even this time sequence is an illusion, for it is somewhat arbitrary which 
card is called the first. In fact, p(R,) and p(R,) do not “depend on the other 
card.” They are not conditional probabilities. 

Our final example is also taken from cards. The results are somewhat 
surprising at first glance. 


21 Example 


Two cards are chosen at random from a deck. One is a king. What is the 
probability that both are kings? If one is a king of spades, what is the 
probability that both are kings? 


To answer the first part, there are (7) = 1,326 possible hands. Of these 


there are (7) = 1,128 hands without kings. Thus there are 1,326— 1,128 = 


2 
198 hands with at least | king present. Clearly, there are (3) = 6 hands with 
2 kings. Thus, by Theorem 19, the required conditional probability is 

p(both kings] one is a king) = 58 (= 3.03%) 


The calculation for the second part is similar. There are 51 hands with the 
king of spades. Of these, 3 have 2 kings. Thus the required probability here is 
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p(both kings|one is the king of spades) = 1 (= 5.88%) 


almost double the first answer. 

We have noted in Example 20 that the computation p(both cards red|first 
card is red) can be computed by taking a specific instance (say the 3 of hearts) 
for the first card. However, p(both cards kings|one is a king) cannot be com- 
puted by taking a specific instance (say the king of spades). We shall go into 
this in detail in Section 5, but for the present we may say that the reason for 
this disparity is that in Example 20 the specific instances all yielded the same 
answer and were mutually exclusive. In Example 21, the four specific in- 
stances (one is the king of spades, one is the king of clubs, etc.) all yielded the 
same answer but were not mutually exclusive. It is this feature that dis- 
tinguishes the two problems. 

In this section we defined the conditional probability p(A|B) of an event A 
relative to an event B. We have regarded it, intuitively, as a certain kind of 
probability of the event A. Yet we have not discussed the underlying prob- 
ability space and the probabilities of its elementary events in such a way as 
to make p(A|B) a bona fide probability of an event, in the sense of Definition 
12 of Chapter 1. This will be done in Section 3.5, and the interested reader 
can refer to Definition 39 of that section to see how this is done. 


EXERCISES 


1. In Table 3.2 find the percentage of females in the 45-and-over age 
category. Use both Definition 14 and Theorem 15. 


2. When 3 dice are tossed, what is the probability that the high number 
is 3 or under, given that it is 5 or under? (Use the results of Table 1.3.) 


3. Using Table 1.12, find the probability that the sum 7 occurs when 2 
dice are tossed, given that a sum between 5 and 9 inclusive occurred. 


4. Using Table 2.17, find the probability of receiving “2 pair” in poker, 
given that a ‘‘bust’’ did not occur. Compare with the conditional relative 
frequency using the experimental results of that table. 


5. Show that ifA C Band p(B) # 0, the conditional probability p(A|B) 
is greater than or equal to the “absolute” probability p(A). When does 
equality hold? 


6. Find the probability of tossing the sum 9 or more on 2 dice given that 
1 of the dice is 3 or more. Find the probability, given that the first die is 3 or 
more. 


7. In a poker hand (5 cards), what is the probability of exactly 2 aces, 
given that there is 1 or more ace in the hand? (Do not evaluate.) 
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8. An urn contains 10 black balls and 5 red balls. Three balls are chosen 
(without replacement) at random. Given that they are not all red, find the 
probability that they are all black. 


9. Three cards are chosen at random from a deck. Find the probability 
that they are all aces. Find the probability that they are all aces given that 
they all have different suits. Find the probability that they are all aces, given 
that they are not all the same color. 


10. In applying to schools A and B, Leonard decided that the probability 
that he will be accepted by 4 or B is .7 or .8, respectively. He also feels that 
there is a probability .6 of being accepted by both. 

a. What is his probability of being accepted by at least one of the 
schools? 

b. If he is accepted by B, what is the probability that he will also be 
accepted by A? 


11. If a 5-card poker hand has cards of the same color, what is the con- 
ditional probability that they are of the same suit? 


12. A soap firm knows that 23 percent of its customers are magazine 
readers, 72 percent are television viewers, and 12 percent are magazine 
readers and television viewers. If a customer is found to be a magazine 
reader, what is the probability that he is a television viewer? Suppose he is 
a television viewer. What is the probability that he reads magazines? 


13. If B has a high probability of occurring, then p(A|B) is almost the 
same as p(A). Illustrate this statement for the case p(B) = .9, p(A) = .6. 
Find upper and lower bounds? for p(A|B). Do the same for p(A) = .6 and 
p(B) = .99. (Hint: See Exercises 19 and 20 of the previous section.) 


14. An 8-card deck is composed of the 4 aces and the 4 kings. Two cards 
in a specific order are chosen at random. Find the probability that both are 
aces, given that 

a. the first card is an ace. 
b. the hand contains the ace of spades. 
c. the hand contains an ace. 


15. Describe an experiment that may be performed to find, experi- 
mentally, the probabilities described in Exercise 14b and c. Run the experi- 
ment to obtain 50 readings on the probability in Exercise 14c. Keep track 
of the readings relevant to Exercise 14b, and in this manner obtain experi- 
mental conditional probabilities for Experiment 14b and c. 


2 Thus find how large p(A | B) can possibly be and, similarly, how small it may be. 
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3 PRODUCT RULE 


The notion of conditional probability has wide application in practical 
situations. For example, to determine insurance rates, insurance companies 
find the probability that a driver will get involved in an auto accident. 
Experience shows them that the probabilities vary according to age, sex, 
experience, etc. Thus they effectively reduce the sample space and are 
dealing with conditional probabilities. (The rate structure reflects this 
fact.) Similarly, if a school wishes to find the probability that a student 
receives grade B or better, a thorough analysis would find this probability 
for freshmen, sophomores, etc., and perhaps according to other categories. 
Again, conditional probabilities are involved, because the sample space is 
restricted. In this section we learn how to apply information about con- 
ditional probability. 


22 Theorem. The Product Rule 
p(A ON B) = p(A) : p(BIA) (3.19) 


This formula is merely another version of Equation 3.15. (The roles of A 
and B are reversed, and it has been cleared of fractions.) It is used to com- 
pute p(A € B) with the help of a known conditional probability p(B|A). 


23 Example 

In a certain high school, 40 percent of the seniors will go to college next 
year. The seniors are 18 percent of the school population. What percentage 
of the school will go to college next year? 


If we let S = seniors and C = college-bound students, we have p(C|S) = 
.40 and p(S) = .18. Thus 


p(S NM C) = p(S)- p(C|S) = (.18) (.40) = .072 = 7.2% 


(We are assuming that only the seniors can go to college next year.) 
In this example we interpreted relative frequency as a probability. This is 
permitted, as discussed in Section 5 of Chapter 1. 


24 Example 


Two cards are chosen from a deck. What is the probability that the first is 
an ace and the second Is a king? 

If we let A, be the event ‘‘first card is an ace” and let K, be the event 
“second card is a king,” we wish to find p(A, M K,). By the product rule 
we have 


P(A; O Kz) = p(A;) : p(K,|A) = (%) (4) = 
[We may also find p(K, M A,) = p(K,) - p(A,|K,) to arrive at the same 


98 ELEMENTARY PROBABILITY THEORY 


answer. There is no “time sequence” implied in the product rule.] The com- 
putation is very similar to previous computations. Thus we may say that 
there are 4-4 ways of succeeding out of 52-51 hands. But the above pro- 
cedure is more natural, because it involves simple probabilities at each 
stage. 


25 Example 


Urn I contains 5 red and 3 green balls. Urn II contains 2 red and 7 green 
balls. One ball is transferred (at random) from urn I to urn II. After stirring, 
1 ball is chosen from urn II. What is the probability that the (final) ball is 
green? 

This is a 2-stage process, and it is natural to proceed by cases. Let R, be 
the event that a red ball was transferred, and let G, be the event that a green 
ball was transferred. Similarly, let G, be the event that the final ball is green. 
The computations of the conditional probabilities are easy?: p(G,|R,) = 4, 
p(G.|G,) = 4. We also have p(R,) =3, p(G,) = 3. Using these equations, 
we can find p(R,; NM G,) and p(G, NM G,). Thus 


P(R, N G2) = p(R;) - p(G,|R;) = (3) (io) = 8 
P(G, M G.) = p(G,) - p(G,|G,) = (3) Gs) = % 


Finally, the event G, can happen in two mutually exclusive ways: R,; NM G, 
andG, M G,. Thus, by Equation 3.4, we have 


p(G,) = %4+ 3 = #8 (= .738) 


There is an interesting and useful way of illustrating this procedure at a 
glance. This is done with the help of a diagram called a “tree diagram”’ (see 
Fig. 3.4). Starting at the extreme left, we form two “branches” correspond- 
ing to the two first stages (R, and G,). Along these branches we put the 
probability of going along that branch. At the end of each branch we form 
two other branches, corresponding to the two possible second stages. Along 
each of these branches we put the conditional probability of proceeding 
along that branch, given that we have reached that juncture. Finally, each 
path from left to right represents an event whose probability may be com- 
puted by multiplying the probabilities along the branches. (This is the 
product rule, Equation 3.19.) In general, we do not even need “‘stages.” At 
each juncture we have branches corresponding to mutually exclusive events. 
Then each path corresponds to the intersection of the events at the junctures. 

In our example the event “final ball green’? corresponds to the 2 paths 
terminating at G (darkened in Fig. 3.4), and we add the probabilities along 
these paths to find the probability of a path terminating at G. 

The process may be repeated for more than 2 stages, because we may 


3 However, see the remarks following Example 20. 
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Qojee 


Qojen 
oie wk ss 
J -X 
Q 


3.4 Tree Diagram for Urn Problem 


apply the product rule repeatedly. We state and prove it for 3 events. The 
generalization to more than 3 events is straightforward: 


p(A 1 BNC) =p(A)-p(B\A)- p(C|A 2 B) (3.20) 


To prove this equation we write p(A N BN C)=p((AN B)NC)= 
p(A Q B)-p(C|A ON B). We then apply the product rule once again to 
p(A M B) to obtain this equation. Diagramatically, we have the path of 
Fig. 3.5. 


p(BIA) p(C|A N B) 
— S———- 


p(A) A B 


3.5 PathforAN BOC 


C 


26 Example 


Amy is happy 50 percent of the time, sad 20 percent of the time, and neutral 
30 percent of the time. She is going shopping for a dress. When she is happy, 


Happy —*_ Dress: .45 


Sad 4+ Dress: .02 


Neutral —— Dress: .12 
59 
3.6 Buying a Dress 
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she will buy a dress with probability .9. When sad, she will buy one with 
probability .1. Otherwise, she will buy a dress with probability .4. What is 
the probability that she will buy a dress? 

The entire analysis is summarized in Fig. 3.6. We do not include the 
branches where no dress 1s bought. The probability is 59 percent. 


27 Example 
Suppose that in Example 26, Amy comes home with a dress. What is the 
probability she was happy? 

This is a conditional probability problem. We must find p(happy|dress). 
Using Equation 3.15 and referring to Fig. 3.6, we have 


p(happy|dress) = p(happy / dress) 


p(dress) 
45 45 
= 9 — 59 = 76.3%) 


Thus the purchase of a dress may be used as evidence of Amy’s happiness. 

We were given p(dress|happy) and we found p(happy|dress). The differ- 
ence between the two is striking, both numerically and conceptually. We 
think of p(dress|happy), p(dress|sad), etc., as “before the fact’ (a priori) 
conditional probabilities. The probabilities p(happy|dress), p(sad|dress), etc., 
are thought of as “after the fact” (a posteriori) probabilities. Thus, after we 
receive the information that the dress was bought, we may recompute the 
probability of “happy,” given that this event occurred. The whole procedure 
is generalized in the following theorem. 


28 Theorem. (Bayes’ Theorem) 
If the sample space S is the union of the mutually exclusive events E,,..., 
E,, then 


_ P(E;) - pP(A|E;) 
PEA) = Ey IE) + +p(E) pal) 2 


To prove this formula, we use Equation 3.15: 


P(E; N A) 
p(A) 


By the product rule, p(E; N A) = p(E;) - p(A|E;). The event A is the 
union of the mutually exclusive events A M E,,...,A M E,. (Thus A can 
happen in s mutually exclusive ways—with E,, with E,,..., or with E,.) 
Hence, by Theorem 3.10, p(A) = p(A M E,)+-°:+p(A NO E,). Finally, 
we may apply the product rule to each of these summands to obtain Equation 
3.21. The entire procedure is summarized in Fig. 3.7. In practice we may 
use a figure similar to this rather than memorize Equation 3.21. 


P(E;|A) = 
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E, 24 4: p(E, A) = p(E,) « p(A\E;) 


E, 24 4: p(E, 0 A) = p(E;,) : p(AlE;) 


P(E,) 


E, 24 4: p(E, A) = p(E,) : p(AlEs) 


p(A) = p(E,) : p(Al|E,) +++ -+p(E,) - p(AlE;) 
3.7 Bayes’ Theorem 


29 Example 


The percentage of freshmen, sophomores, juniors, and seniors in Desolation 
High School is 30, 30, 20, and 20 percent, respectively. The probability 
that a freshman takes and passes mathematics is .90. Similarly, the prob- 
abilities for the other classes are, respectively, .80, .70, and .40. A student 
chosen at random at the end of the term has just taken and passed mathe- 
matics. Find the probability that he is a senior. 

This is a clear-cut application of Bayes’ theorem. The computation and 
method is given in Fig. 3.8. The probability that a student has taken and 
passed math is .73. The required probability is .08/.73 = 4 (= .110). 


Sr. ——math.08 


Jr. 1 math .14 


Soph. —— math .24 


3 


Fr. ? math 27 
73 


3.8 


Finally, let us consider a “‘case-analysis” probability problem solved with 
the help of a tree diagram. 


30 Example 
Three cards are successively drawn from a deck. A player wins if the first 


card is an ace, or if the first 2 cards are pictures (J, Q, K), or if all 3 cards 
have the same suit. What is his probability of winning? 
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Probabilities 
1 
3 .0769 
3°11 Code: 
13-51 0498 P: picture card 
S: same suit 
3-10-11 — 0100 as previous 
13-51-50 A: Ace 
9-12-11 _ 
13-51-50 2228 
Card1 Card2 Card3 p = .1725 


3.9 Tree Diagram for Card Game 


Figure 3.9 gives an appropriate tree diagram and solution. One word of 
caution is in order. We are using P for “picture card” and § for “same suit”’ 
in a generic way. These are not events. Thus the branch from P (card 1) to 
P 1 S (card 2) refers to the case that the first card is a picture and the 
second card is the same suit as the first but not a picture card. Here P stands 
for the event P, (first card is a picture card) and P Q S stands for P, N S, 
(second card is not a picture and the first 2 cards have the same suit). The 
probability 2° along this branch is the conditional probability p(P, M S,»|P;). 
Similar explanations are in order for all the other branches. In applications, 
we are often not too explicit about the underlying sample space or even the 
events in question. 


EXERCISES 


1. If, at a certain time, 60 percent of all television owners are watching 
television, and 32 percent of the watchers are watching a musical comedy, 
and of these only 20 percent are paying attention, what percentage of 
television owners are attentively watching a musical comedy? 


2. Eighty percent of all the apples in a market are expensive. Ten per- 
cent of the expensive apples are tasteless. What percentage of the apples 
are tasteless and expensive? 


3. A stamp collector orders 80 percent of his stamps from dealer X and 
20 percent from dealer Y. He knows that X sends him a faulty stamp with 
probability .05, whereas Y sends him a faulty stamp with probability .15. 
What is the probability of receiving a faulty stamp? If a stamp is found to be 
faulty, what is the probability it came from dealer X? 
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4. Factories A, B, and C produce 30, 20, and 50 percent of a manu- 
facturer’s transistor radios. Quality control varies in these factories, so 
that 5, 7, and 2 percent of the output in factories A, B, and C are defective, 
respectively. A radio from a factory is discovered to be defective. Compute 
the probabilities that it came from factory A, from B, or from C. 


5. In Example 25 suppose that the final ball is green. What is the prob- 
ability that it was the ball from urn I? 


6. Players A, B, and C are playing a chess tournament. Assume that A 
and B are equally matched and that both can beat C with probability .6. 
(Draw games are replayed.) In the tournament, A and B play each other, 
and then C plays the winner. Find the respective probabilities that A, B, or C 
wins the tournament. If the rules are changed so that A plays the winner of a 
B-C game, what is the probability of winning for each player? 


7. The probability of skidding in the snow at a certain intersection is 
.1 if the car has snow tires but .4 is the car is not equipped with snow tires. 
A safety inspector, observing that intersection in the snow, discovers that 
10 percent of the skidders have snow tires and the remaining skidders do not. 
The inspector wishes to estimate the percentage of cars in the city equipped 
with snow tires. Estimate this percentage, and state what assumptions you 
are using to derive this estimate. 


8. Urn I contains 3 red and 7 black balls. Urn II contains 5 red and 5 
black balls. They are identical in appearance. One urn is chosen at random 
and a ball is chosen from that urn. It is found to be black and is discarded. 
Another ball is chosen from that urn. What is the probability that it is 
black? What is the probability that urn I was chosen? 


9. A bag contains 9 ordinary coins and 1 two-headed coin. A coin is 
chosen at random from that bag and is tossed 3 times, each toss yielding a 
head. What is the probability that the coin is two-headed? What is the 
probability that the next toss will be a head? 


10. In Exercise 9 how many consecutive tosses of heads are necessary 
to deduce that the coin is two-headed with probability greater than .95? 


11. When playing poker, a player discovers that Donald bluffs, at random, 
3 Of the time. When bluffing he bets high 90 percent of the time and low the 
other 10 percent of the time. When not bluffing he bets high 50 percent of 
the time and low SO percent of the time. Donald’s opponent has observed 
that Donald is betting high. What is the probability that he is bluffing? 


12. A mixture of grass seed contains 40 percent bluegrass, 55 percent 
ryegrass, and 5 percent crabgrass. On a certain lawn, the probability that 
bluegrass germinates is .4; the probability for ryegrass and crabgrass is .8 
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and .99, respectively. When this seed 1s used on that lawn, what percentage 
of the grass will be bluegrass? ryegrass? crabgrass? 


13. Three cards are drawn from a deck. What is the probability of drawing 
3 pictures, or 3 cards of the same suit, or a straight (ace high or ace low)? 


14. An integer n 1s chosen at random between 1 and S inclusive. Once 
the number nv is determined, an integer m is chosen at random between 1 and 
n inclusive. What is the probability that the second number is 1? 


4 INDEPENDENCE 


When we perform a series of probability experiments, we take care that 
the results of the experiments are independent of one another. Thus, if we 
choose one card from a deck to observe its color, we must make sure that 
when the experiment is repeated, we replace the card and shuffle well. If we 
were sloppy and put the card in the middle of the deck before proceeding to 
the new experiment, the probability of a particular color would depend on 
the color of the first card. We would be then dealing with conditional prob- 
abilities, because we know that the second card will not be the original card. 
We now formalize the notion of independent events. Roughly speaking, 
events A and B are independent if the probability of A is unchanged even if 
itis known that B happened. 


31 Definition 
Let S be a sample space and let A and B be events. We say that A and B are 


independent if 
p(A) = p(A|B) (3.22) 


We can put this equation into a more symmetric form. By Equation 3.15 
we may write p(A|B) = p(A OM B)/p(B). Clearing fractions, we have 


P(A O B)=p(A)p(B) (A and B independent) (3.23) 


Conversely, we may divide this equation by p(B) to obtain Equation 3.22. 
Equation 3.23 is, however, slightly more general than Equation 3.22 because 
it includes the case p(B) = 0, which was implicitly excluded in the original 
definition. We shall take the more general Equation 3.23 as the criterion for 
independence, thereby extending Definition 31 to include the case p(B) = 0. 
Ifp(A NM B) 4 p(A)p(B) we shall say that A and B are dependent. 

The first observation we make is that the condition that A and B are 
independent is symmetric with respect to A and B. Thus, if A and B are 
independent, so are B and A. This follows immediately from Equation 3.23, 
because A 1 B=BO A. In the card example above, if we let R, = red 
card on first experiment and R, = red card on second experiment, then R, 
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is independent of R,, and R, is also independent of R,. (The sample space 
is taken to be all ordered couples of cards with replacement.) 

If Definition 31 is applied to a frequency table (interpreting relative fre- 
quencies as probabilities), we see that dependent events sometimes indicate 
a causal effect. In Table 3.2, suppose we let M = unemployed males over 
20 and § = unemployed people 65 years and over. We have f(M) = 65.6 
percent and f(M|S) = 3.0/3.8 = 78.9 percent. Thus a disproportionate 
number of males in the older category are unemployed. We can say that M 
and § are dependent, but we must be careful about saying that being over 
65 tends to cause males to be unemployed. In fact, § and M are also depen- 
dent and we might also say that being over 65 and unemployed causes a 
person to be a male. The same argument raged (and will no doubt continue 
to rage) on whether cigarette smoking causes lung cancer. The undisputed 
evidence is that the event that a person is a smoker 1s dependent on the 
event that a person has lung cancer. It is certainly also theoretically con- 
ceivable that people prone to lung cancer happen to be more attracted to 
cigarettes, or that a common cause (perhaps nervousness, intelligence, etc.) 
is the reason for that dependence. Causal effects are difficult to demon- 
strate, but dependence of events can often be easily demonstrated. The 
intelligent person distinguishes between the two concepts. 

We may generalize Definition 31 and Equation 3.23 to independence of 
several events A,,..., As. 


32 Definition 


The events A,,...,A, are said to be independent if 
p(A; N A;) = p(Ai)p(A;) l<xi<j<ss 


P(A; N Aj ON Ax) = p(Ai)p(A;)p(Ax) lsxi<j<kss (3.24) 


p(A, Mores M A.) = p(A,) +++ D(As) 


The first series of equations (the events taken 2 at a time) implies that any 
2 of the events are independent. The second series, together with the first, 
shows that any event is independent of the simultaneous occurrence of 2 
of the others: p(4;|A; NM Aj) = p(A;) for distinct i, j, and k. (Use Equation 
3.20 for a proof.) Similar interpretations are possible for all these equations. 


In all, there are (5) equations using 2 different sets, (5) equations using 3 
different sets, etc. Thus there are (5) + (3) treet (") equations in all. By 


Equation 2.16, this may be simplified to 2” —n— 1. Thus we have 2"—n—1 
equations that define independence. For n = 2 we had 2?—2—1 = 1 equa- 
tion. For n = 3 there are 2? — 3 — 1 = 4 equations. 
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33 Example 

Two dice are tossed. Let events A, B, and C be defined as follows: A = “‘sum 
is 7,” B= “first die is 6,” C = “second die is 4 or over.” We may compute 
P(A) = 6, p(B) =, D(C) = 3, p(A N B)= 35, p(B NO C)=4, p(A N C)=H, 
P(A N BN C)=0. Thus 


p(A N B) =p(A)p(B) 

p(A NC) =pl(A)p(C) 

p(B C)=p(B)p(C) 
P(\AN BMC) A pl(A)p(B)p(C) 


Thus these 3 events are dependent, but any 2 of them are independent. 
Independent events can occur quite abstractly. For example, consider the 
sample space given by the table 


but 


If A = {x,z}, B= {x,y}, then A N B= {x}. We have p(A) = p(x) + p(z) = 
2+ .3 = .5, p(B) = .2+.2= .4, and p(A NM B)=.2. Hence, since .2 = .5 X 
.4, we have p(A M B)= p(A)p(B). The events A and B are independent. 


34 Example 


A die is tossed repeatedly until a 6 turns up. What is the probability p, that 
the 6 first occurs at the kth throw (k = 1,2,...)? 

We note that we have an infinite sample space. We first reduce to a finite 
space by choosing a large number WN and restricting ourselves to at most N 
tosses. In practice, N = 100 gives an excellent chance that a 6 will turn up. 
If we let A, = the event that a 6 occurs on the kth trial, the problem calls for 
the probability of p(A4, N Ap N ++: N A,y_, N A,) for k= 1,2,...,N. We 
make the following reasonable assumptions: 

1. The events A,, A,,...,A,—1, Ay are independent. (This is our way of 
interpreting the notion that what happened on any of the previous trials 
does not affect the probability on a given trial.) 

2. p(A,;)=6. (Thus, on any one trial, the 6 outcomes 1 through 6 are 
equally likely.) 

Thus, by Equation 3.24, we have p,=p(A, N°: A,_, NA, = 
$-3-°-3-¢ =e! (K=1,2,...,N). The probability that no 6 occurs 
during the N trials is (8)". (For N = 100 this probability is 1.2 x 10-8.) We 
also note that p(A, N---M A,_, MN A,) is independent of N fork < N. Itis 
therefore reasonable to take 3(3)*~! as the required answer. These prob- 
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abilities are given in Table 1.6 to 3 decimal places. In that table, we chose 
N = 12, so there was a reasonable chance that no 6 would occur: (@)!? = .112. 

The results obtained in this example could be computed using the uniform 
probability space A X---XA (N factors), where A = {1,2,...,6}. How- 
ever, the present method is preferable for at least 2 reasons: First, it involves 
a simple and direct calculation. Second, assumptions | and 2 above give a 
simple mathematical translation of the underlying assumptions. A simple 
computation which also goes to the heart of the problem is always a happy 
occasion. 

The following theorem gives some of the algebraic properties of indepen- 
dent events. 


35 Theorem 

(a) If A and B are independent, then so are A and B. More generally, if 
A,,...,A, are independent, so are A,,A,,...,A,. (b) If A, B, and C are 
independent, then A M B and C are independent, as are A U BandC. More 
generally, if A,,...,A, are independent, A,  Ay,A3,...,A, and A; U Ag, 
Az,...,A, are independent. 

To prove part (a), suppose A and B are independent. In general, 4 MN B= 
B—(A 1‘ B). (The points in B and not in A are the points of B less the 
points of A that are in B. See Fig. 3.10.) Also (A NM B) C B. Thus, using 
Equations 3.4, 3.23, and 3.2, we obtain 


p(A N B)=p(B—(A N B)) =p(B)—p(A 0 B) 


= p(B) —p(A )p(B) 
= [1—p(A)]p(B) 


= p(A)p(B) 
This proves that A and B are independent. The proof for independent 
events A,,...,A, 1S similar and is left to the reader. 


3.10 BQ A=B-(AN B) 
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To prove the result on intersections and union, we note that if A, B, and 
C are independent, then clearly A M B and C are independent, because 
p(A N B)N C)= p(A)p(B)p(C) = p(4A NM B)p(C). We may prove that 
A U B and C are independent, by applying these results on complements 
on intersections, as well as De Morgan’s laws (Equation 3.6). Thus we have 


If. A, B, C are independent, then A, B, C are independent. 
Hence A, B, C are independent. 
Hence A_Q B,C are independent. 
Hence A ON B,C are independent. 
HenceA U B,C are independent. 


The last step used the obvious formula A = A and De Morgan’s laws. The 
result for s independent events is proved similarly. 

We may apply this theorem several times to obtain a result such as: If 
A, B,C, D, and E are independent, then A U B,D O E, and C are indepen- 
dent. 


36 Example 

Six dice are tossed. What is the probability that at least one 1 appears? How 
many dice are required to assure a better-than-even chance that at least 
one | appears? 

If A is the event that at least one | appears, it is easier to consider the 
complementary event A that no 1 appears. This event equals the intersection 
of the event A, (no 1 on the first die), A, (no 1 on the second try), etc. Thus, 
since the A; are independent, p(A) = p(4, N «++: MN Ag) = (8 = .375). The 
required probability is therefore p(A) = 1—p(A) = .625. For the second 
part, we compute that (3)? = .58, while (@)* = .48. Thus 4 dice are required 
if a better-than-even chance is desired. 

In this problem A; was the event “1 on the ith die,” and we wished to 
compute p(A), where A=A, U-:::U Ag. The complement was A = 
A, ++: Ag. This is a general situation (complements take unions into 
intersections and vice versa) and was already mentioned in Equation 3.6. 
The more general equations are also called De Morgan’s laws: 


A, U:::UA,=A,N**:NA, A, N**+:NA,=A,U°*+ UA, 


(3.25) 
The proofs are left to the reader. 
The procedure in Example 36 may be generalized. 
37 Theorem 
If A,,...,A, are independent events, with probabilities p,,...,p,, respec- 


tively, then 
D(A, U A, Us::U A,) =1- (1—p,) (1—pe) se" (1—ps) (3.26) 
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For example, 


P(A; U Ap) =1— (1—p,) 1—p2) = 1— (1p, — pp + py pa) 
= Pit P2—PiP2 


which is the form of Equation 3.5 in this case, because p(A,; M A.) = 
p(A,)p(A_). _ _ 

To prove Equation 3.26 we first note that A,,...,A, are independent by 
Theorem 35. Hence 


p(A, N-++ A,) = p(A,) +++ p(A,) 
= (1—p,)+-- (1—p,) 


But, by Equation 3.25,4, N-:-M A,=A, U+-+ U A. Thus 
p(A, U-:-U As) = (1—pi) +++ Ups) 


Finally, we obtain the result using Theorem 5. 
Another example, taken from the theory of numbers, illustrates the 
power of Theorem 35. 


38 Example 
How many integers from | through 120 inclusive have no factor (except 1) 
in common with 120? 

We factor 120 into primes: 120 = 23: 3-5. We see that we are looking for 
integers not divisible by 2, 3, or 5. Let S be the (uniform) probability space 
of integers from 1 through 120. Let A, be the event “divisible by 2,” and in 
general let A; be the event “divisible by i.’ Then if i is a divisor of 120, 
p(A;) = 1/i. To see this, the numbers divisible by i are precisely 1-7,2-i,..., 
(120/i) - i. There are 120/i such numbers and 120 sample points. The quo- 
tient is 1/7. 

Now we claim that 4., A;, and A; are independent. To see this, note that 
A, 1) Az consists of those numbers divisible by 2 and 3—hence by 6. Thus 
Ay 1 Az =Ag, and p(A. NM As) = p(A,)p(A3) since § = (2)(3). Similar calcula- 
tions apply for Ag N As, Az NM As, and A, NM Az NM As. We are looking for 
n(A, M Az M As). Since A>, Az, and A; are independent (Theorem 35), we 
have 7 _ 7 

P(A, N Az N As) = (1-4) (1-3) (1-3) 
Thus, since there are 120 sample points, we have, by Equation 1.15, 
n(A, M Az NM As) = 120-4-%-4=32 
Thus, in contrast to our previous attempts at computing probabilities by 
counting, we have here counted by computing probabilities. 


The general situation in number theory is as follows. We let y(n) be the 
number of integers from 1 through n which have no factor, except 1, in 
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common with n. We let p,,...,p; be all the distinct prime factors of n. Then 
(n) = n(1—— vee (1-—.) (3.27) 
? Pi Pr ) 
EXERCISES 


1. If a card is chosen from a deck, show that the events “red” and ‘‘ace”’ 
are independent events. 


2. In Exercise 1 are the events “red,” “ace,” and “diamond or club” 
independent? Show your calculations. 


3. Two dice are tossed. Determine whether the events “6 on first die’ 
and “sum is 8” are independent. 


4. If p(AN BO C)=p(A)p(B)p(C), are A, B, and C independent? 
Give reasons. (Hint: Construct an appropriate probability space, with A = 
{x,a}, B = {x, b}, C = {x, c}, and S = {a, b,c, d, x}.) 


5. If S is a uniform probability space with k elements, and if A and B are 
independent events, find a formula relating n(4), n(B), and n(4 OM B). 
Generalize to more than 2 sets. 


6. Prove: If A,,....A, are independent, then A,,...,A,_, are indepen- 
dent. 


7. Prove: If A;,...,A, are independent, then 9, A,,...,A, are indepen- 
dent. 

8. A coin has probability p of landing heads and probability 1—p of 
landing tails. Assuming that 0 < p < 1, prove that p(HT|(HT U TH)) =3. 
If a coin is suspected to be unfair, this result, properly used, will convert 
a coin into a “‘fair’’ coin. Explain how this may be done. 


9. One person tosses a coin, another picks a card, and a third throws 2 
dice. 
a. What is the probability that a head, a red card, and the sum 8 occurs? 
b. What is the probability that at least one of these events occur? State 
what assumptions you are making to arrive at your answer. 


10. Suppose that the table of a sample space 1s as follows: 


Let A = {d, e, f} and B = {b, c, e, f}. Show that A and B are independent. 
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11. Ten people, chosen at random, are in a room. What is the probability 
that at least one of them has April 11 as his birthday? (Hint: Ignore leap 
years, assume uniformity and independence, and use Equation 3.26 and the 
binomial theorem.) 


12. Some people answer Exercise 11 by claiming that since the probability 
is 1/365 in each case, the answer is 10/365, because there are 10 people. 
Using the binomial theorem, explain why this argument gives the approximate 
answer in this case but gives a terrible answer if there are 200 people in the 
room. 


13. An urn contains 9 red balls and 1 white ball. Twenty balls are chosen, 
one at a time, with replacement. What is the probability that the while ball 
will be chosen? Do not evaluate. 


14. A coin is tossed repeatedly until a head occurs. What is the probability 
that k tosses (k = 1,2,3,...) are required? 


15. Write out Equation 3.26 explicitly for s = 3. Identify each one of the 
summands as a probability of an appropriate event. 


16. Generalize Exercise 15 to an arbitrary number s. 


17. How many integers are there between 1 and 840 inclusive which have 
no factor except | in common with 840? 


18. How many integers are there between 1 and 12,000 inclusive which 
have acommon factor, larger than 1, with 12,000? 


19. Prove: If A C B and if A and B are independent, then either A is 
impossible or B is certain. 


20. Alex, Ben, and Carl independently attempt to solve a puzzle. The 
probabilities that they solve the puzzle are, respectively .8, .9, and .7. Find 
the probability that the puzzle will be solved. State your assumptions. 


21. A die is tossed repeatedly, for not more than 10 times. What is the 
probability that a 2 occurs before a 3? (i.e., a 2 occurs on some trial, and a 3 
did not occur on the previous trials.) What is the probability that neither a 2 
nor a 3 occurs? 


22. A die is tossed until a 2 or 3 turns up. What is the probability that a 2 
occurs before a 3? 


23. Using Formula 3.5, prove that if A, B, and C are independent, then 
A U Band C are independent. 


24. Prove: If A and C are independent, B and C are independent, and 
A B=9@,thenA U Band C are independent. 
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*5 CONSTRUCTION OF SAMPLE SPACES 


In the preceding sections we introduced many different probability spaces to 
compute various probabilities. In this section we shall investigate some of 
the underlying assumptions behind our choice. 

We first note that we have consistently treated p(A|B) as a probability, 
although the definition (Equation 3.15) does not, by itself, give us a probability 
space. Equation 3.15 is sufficient motivation for the following definition. 


39 Definition 

Let S be a probability space, and let B C S with p(B) # 0. The probability 
space S|B (read: S restricted to B) is the probability space whose sample 
points are the same as § but whose probabilities p(s|B) are defined by the 
formulas 


p(s|B) = ifs is in B (3.28) 
p(s|B) =0 if sis notin B (3.29) 


To verify that this is a probability space in the sense of Definition 6 of 
Chapter 1, we must verify Equation 1.5 for this probability. (Equation 1.4 is 
a trivial consequence of Equation 1.5, because each probability is non- 
negative.) We point out that the sum 2, <5 p(s|B) can be computed only for s 
in B, because the other terms are zero by Equation 3.29. We leave the details 
of this proof to the reader. 

Often S|B is defined by taking B as the sample space and using Equation 
3.28. This amounts to ignoring the points of B. The distinction between the 
two notions is not too critical, because the ignored points have probability 0. 

Once Definition 39 is accepted, we may define the probability (in the 
restricted probability space S|B) of an event A in the sense of Definition 
12 of Chapter 1. It is not surprising, of course, that this probability is simply 
p(A|B). We leave the formal proof to the reader. In turn, this implies that all 
the concepts, theorems, and methods of probability theory apply to condi- 
tional probability. For example, p(4|B) = 1—p(A|B) and p(A U B|C)= 
p(A|C)+p(B|C)—p(A NM B/C). We may even form conditional probabili- 
ties in $|B, although this does not lead to anything new. (See Exercises 
4 and 5 at the end of this section.) 

In the discussion following Example 20 we observed that p(4|B) can be 
evaluated if the probabilities of A, given certain instances of B are known. 
This result is governed by the following theorem. 


40 Theorem 
Suppose B is the union of mutually exclusive events B,,...,B,, and that 
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pi = p(B,) # 0. Suppose p(A|B;)=p, the same value for each i. Then 
p(A|B) = p. 


Remark. We have already noted in the discussion following Example 3.21 
that it is essential that the B; be mutually exclusive. 

To prove this theorem, we first note that B M A is the union of the mu- 
tually exclusive events B; M A (see Fig. 3.11). Thus, if an outcome is in B 
and in A it must be in exactly one B, and in A, and conversely. Then we have 


3.11 (B, U-++U B,) N A=(B, MN A) U-++ U (B, NA) 


p(B A)=p(B, N A)+:--+p(B, NM A) 
= p(B): p(A|B,) +++ ++p(B,) - p(A|Bs) 
= Pipt***+Dpsp 
= (pit:::+p,)p = p(B)p 
because B is the union of the mutually exclusive B;. Thus 
P(A NB) _ 
p(B) 
which is the result. The reader should construct a tree diagram to clarify 
the proof. 
Another method of constructing probability spaces was to identify differ- 
ent outcomes by agreeing to regard them as the same outcome. In Example 1 
of Chapter 1 we considered only 6 sample points, despite the common 


knowledge that there were more than 6 observable outcomes. Rather, we 
took the event A, “highest number is 6” and regarded it as one sample 


p(A|B) = 
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point, along with 5 other similarly constructed sample points. This pro- 
cedure of identification may be defined as follows. 


41 Definition 
If S is a sample space with events A,,...,A, such that (a) the A; are mutually 


exclusive and (b) S=A, U:::UA,, then we say that the A,’s form a 
partition P of S. 


42 Definition 

If S is a sample space and {A,,...,A,} is a partition P of §, we define the 
new sample space $+P (read: S identified under P) by taking as sample 
points the events A,,...,A, and defining the probabilities p(A;) to be the 
probability of A; as an event. 

In brief, p(A4;) is as it always was, but we now regard A; as a sample point 
of S +P. 

We leave to the reader the easy verification that § + P is asample space in 
the sense of Definition 8 of Chapter 1. A useful property of § +P is that 
the probabilities of events are consistent with probabilities in §. For ex- 
ample, if we take S as the sample space for 3 dice and A; to be the event 
“high die is i,” then §+P is the 6-point sample space of Example 1 of 
Chapter 1. The event E “high die is even’”’ may be regarded in § ~ P as the 
event {A,,A4,A,¢} or as an event in S. In either case the probability is the 
Same. 

The following examples illustrate this identification procedure: 

1. In any experiment, such as the 3-dice experiment of Example 1 of 
Chapter 1, when we identify an outcome by some characteristic, we are 
identifying in the sense of Definition 42. Each set A; contains all the elements 
having a particular characteristic. 

2. In D’Alembert’s problem (Example 17 of Chapter 1) and in the dice- 
tossing problem of Example 34, we stopped the experiment as soon as an 
occurrence happened. Ordered n-tuples were required, and the procedure 
involved (in theory) continuing n times because “it didn’t matter.” In 
D’Alembert’s problem we agreed that the event H was to be construed as 
{HT, HH}. In dice tossing, if we throw at most 3 times and if we stop at 
the first 6, we agree that the event (1, 6) shall be regarded as a shorthand for 
{(1, 6, 1), (1,6, 2),...,(1, 6, 6)}. Thus (1,6) was an event A,, consisting 
of all triples (1, 6, x). We regard A,, as a sample point because of this identi- 
fication procedure. 

3. When we go from ordered to unordered n-tuples, we identify different 
ordered n-tuples if they yield the same unordered n-tuple. This was done in 
poker. It can be done in tossing 2 identical dice and finding the sample 
space of observably different outcomes. Here p(1, 1) = 4, but p(1, 2) =<, 
because (1,2) is the event {(1,2), (2, 1)}. (We use the symbol (a,b) to 
designate an unordered couple.) 
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Another, most important, construction is that of the product space S X T, 
where § and T are probability spaces. We have already defined S x T as a 
set (see Definition 1 of Chapter 2), and we shall shortly define probabilities 
in § X T. The motivation for this definition is that if we choose a point s of 
S independently of a point t of T, the probability of choosing s and t is 
p(s)p(t). Choosing s and t may be regarded as choosing an ordered couple 
(s,t), and we are led to the following definition. 


43 Definition 

Let S and T be probability spaces. Then S X T is defined to be the probability 
space whose points are all ordered couples (s, t) with s in § and ¢ in T and 
with probability defined by 


p(s,t) =p(s)p(t) (3.30) 


44 Example 

Let S = {a,b,c} with p(a) = .3, p(b) = .5, p(c) = .2. Let T = {x, y, z}, with 
p(x) = .1, p(y) = .7, p(z) = .2. Then S X T has the 9 sample points of Fig. 3.12. 
The probabilities appear under the sample points. The probability of (b, z), 
for example, is given by Equation 3.30: p(b, z) = p(b)p(z) = (.5)(.2) = .10. 


3.12 Product Space 


3 21 


(D, x) °; 3») 
05 


It is necessary to verify that S x T is, in fact, a probability space in the 
sense of Definition 6 of Chapter 1. We observe in Fig. 3.12 that the sum of 
each row in the body of the table equals the probability of the particular 
element in S which determines that row. We show this in general. Let § = 
{51,-..,5,}, T = {t,,...,t,}. Then for each s;in S, 


P(S;, 0) + p(S;, tg) +--+ +p(s;,t-) = p(s;)p(t) +p(s)p(ty) +: °° 
+ p(8;)p(t,) 
= p(s;) p(t) +p(t2) +:+++p(t,)] 
= p(5;) 
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Using the X sign, we have %,<7 p(s;, t) = p(s;). Thus, if we add the prob- 
abilities of all the elements in § X T, we obtain (summing one row at a time) 
p(s,)+:::+p(s,) = 1. This proves Equation 1.4. As before, Equation 1.5 
is trivial. Thus S X T is a sample space. 

The extension to several spaces is straightforward. We merely define 
SXTXR=(SXT)XR, identifying the element ((s, ft), r) with the ordered 
triple (s,t,r). Its probability is p(s, t,r) = p(s)p(t)p(r). A similar procedure 
applies to n spaces. 


45 Definition 
If §,,S5,...,5, are probability spaces, the probability space S, x §,x--- 
x5, 18 the set of n-tuples (s,,...,5,), with s; in §; and with probability 
defined by 

P(S1,S82,---,5n) = P(S1)p (52) *** P(Sn) (3.31) 


We may verify that this is a probability space by using the result for n = 2 
over and over again. 

An important special case is when the spaces S; are identical. We then use 
the power notation. 


46 Definition 
S"°=SxXSxX---xXS (n factors). 

Equation 3.31 shows that product spaces are the appropriate probability 
spaces to use if the experiments for S;,...,5, are to be performed indepen- 
dently. Thus each sequence of n possible outcomes in the n experiments is 
represented by one outcome (5,,...,5,) with probability given by Equation 
3.31. In particular, if one experiment (sample space S) is repeated, in- 
dependently, n times, the appropriate sample space is S". We have used this 
idea for coins and dice, always taking §” to be uniform. The reason for the 
uniformity assumption can be explained using the following theorem. 


47 Theorem 


If § and T are uniform, so is § X T. 

In fact, if § has n elements, and 7 has m elements, S X T has nm elements. 
The probability of any one element is p(s, t) = p(s)p(t) = (1/n)(1/m) = 1/nm. 
Thus S§ X T is uniform. 

The extension to n spaces and to §” is immediate. Thus the assumption 
that a coin is fair (S = {H, T} is uniform) and that the trials are independent 
implies that S$” 1s a uniform sample space. 

If S and T are probability spaces and A C S, B C T, then A XB is an 
event in § X T. (Verbally, A X B is the event “first component in A, second in 
B.”) Since we think of A and B as independent, the following result is not 
surprising. 
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48 Theorem 
IfA C S,B C T, then 
p(A X B) = p(A)p(B) (3.32) 
To prove this theorem, write A = {a,,...,a,-}, B= {b,,...,b,}. Fixing a; 


inA, we have 
P({ai} X B) = p(a;, by) + p(aj, bz) ++ +++ p(ai, bs) 

= p(a;) p(b,) + p(a;)p (bz) +++ + +p(a;)p(d,) 
= p(a;)[p(b,) +: +> +p(bs)] 
= p(a;)p(B) 

But A is the union of its elements: A = {a,} U --: U {a,}. Thus 

P(A X B) = p({a,} X B) +- +++ p(ia,} X B) 

= p(a,)p(B) +---+p(a,)p(B) 
= [p(a,) +++ ++p(a,)]p(B) = p(A) p(B) 


This is the result. (Our previous remark that the sum of the probabilities in 
each row of S X T corresponding to an s in S is equal to the probability of s, 
is the special case A = {s}, B=T.) 

This result immediately generalizes to n factors. The proof merely uses 
the result for two factors over and over. 


49 Theorem 
If A; C §;@=1,2,..., n), then 
P(A, XA, X*+*XAn) = p(A1)p(A2) +++ P(An) (3.33) 
If A and B are events in S and T, respectively, it is not correct to say that 
A and B are independent events in § X T. In fact, neither A nor B is an event 
in § X T! However, we can say that the event [A], (of S X T) consisting of 


all elements whose first component is in A and the event [B], of elements 
whose second component Is in B, are independent. 


50 Definition 


Let S=S, X $8, X--- XS, be a product of n probability spaces, and let A be an 
event in S;. We let [A]; be the set of all elements of § whose ith component 
isin A. Equivalently, 


[A], =S,XS,.X-+++XS)-.XA XSi, X°°* XS, (3.34) 


In this case we say that the event [A], is determined by A in the ith com- 
ponent. 
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It is an immediate consequence of Equation 3.33 that 
p([A]i) =p(A) (3.35) 


because all factors on the right-hand side of Equation 3.33 are 1, except 
p(A). For example, if 7 dice are tossed, the probability that the third toss 
isa4oraS is}. Here S = {1, 2, 3,4, 5, 6}, taken to be uniform, A = {4, 5}, 
and in S’7 we have p(([A];) = p(A) =%=+3. [A] is the event of all possible 
throws on which the third die isa 4 ora 5. 

The general notion that if the product space S$ X T is used, then any event 
of S is independent of any event of 7, is expressed by the following theorem. 


51 Theorem 
In S X T suppose that [A], is determined by A in the first component and 
[B], is determined by B in the second component. Then [A], and [B], are 
independent. 

To prove this we set [A],=AXT, [B],=5 XB. Then [A], M [B], is 
the set of points whose first component is in A and whose second is in B: 
[A], AN [Bl], =A X B (see Fig. 3.13). Thus 


P([A], O [B].) = p(A X B) = p(A)p(B) = p({A],)p([B].) 
This is the result. 


3.13 AXB=[A], NM [Bl], 


In an entirely analogous way we arrive at the following useful generaliza- 
tion to n spaces. 


52 Theorem 
Let S =S,X---XS,, and let A,, A,...,A, be events in§, each determined 
by events in different components. Then the events A,,...,A, are indepen- 


dent. 
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In analogy with Theorem 11 or 11’ of Chapter 1, it is possible to construct 
appropriate probability spaces for a many-stage experiment in which the 
occurrences (i.e., the sample space) at the ith stage depends upon what 
occurred at the previous stages. We shall not go into this in any detail, but 
the general idea is worth noting. For two stages S X T is made into a prob- 
ability space by first assigning p,. = p(2nd component is t| 1st component is 
s) (=the probability of going from s to t), and then formally defining 
p(s, t) = p(s)pse. (It is required that Y,.7 ps = 1.) This device is appropriate, 
for example, in choosing ordered couples without replacement. We merely 
choose p,, = 0. The generalization to several spaces is similar to the generali- 
zation of product spaces to several factors. 


EXERCISES 


1. Prove: S|S =S. 
2. Show that in S| B, the ratio p(s,)/p(s.) is preserved for s, and s, in B: 
P(s1) _ p(s,|B) . 
D(5»)  p(59|B) for s,,5,inB 
(For example, if s, is twice as likely as s, and both are in B, then s, remains 
twice as likely as s, if B is given.) 
3. Let S be a probability space and let B C S with p(B) # 0. Let p(s) 
be a new probability defined in $ such that 
— p’(sy) _ p(s) 
P'(s2) ps2) 
b. p’'(s) =0 for s notin B. 


for s,, s,1n B. 


Prove that p'(s) = p(s|B). [Remark: Equation a means that p'(s,)p(s.) = 
p(s,)p'(s,). This avoids all questions about dividing by 0.] 


4. Prove: ($|B)|C exists if and only if p(B MN C) # 0. 
5. Prove: (S|B)|C =S|(B N C)if p(B N C) #0. 


6. Let S be a finite probability space and let Z be the set of points whose 
probability is 0. Show how S —Z may be made into a probability space S in 
a natural manner. 


7. Using the notation of Exercise 6, prove that if S is uniform and 
p(B) # 0, then (S|B) is uniform. 


8. Let S={a, b, c, d, e, f}, with p(a) = p(b) =.2, p(c) =.3, and 
p(d) = p(e) =p(f) =.1. Let A = {a, c, d}, B= {a, e, f}, C = {c, d, e, f}. 
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Construct the probability space for 


a. S|A b. S|(B NC) - 
ce SXS d. §+P, where P is the partition {A, A} 
e. (SX S)|(BXC) f. (S X (S|B))|(S X (S|C)) 


9, Suppose that B = B, U B, with B, N B, = @. Given that p(A|B,) = 34 
and p(A|B,) =3, is it possible to compute p(A|B)? Explain. What can be 
said about the value of p(A|B)? If p(B;) = p(B), find p(A|B). Suppose B, 
is twice as likely as B,. Find p(A|B) in this case. 


10. The following experiments have probability spaces that can be 
written in terms of restrictions, identifications, and/or products. Illustrate 
how this may be done, and briefly discuss your underlying assumptions. 

a. Jim plays Jean 5 games of Ping-Pong, the play stopping when a 
player has won 3 games. The probability that Jim wins a game is .6. 

b. A die is tossed until a 5 or a 6 turns up, or for 10 tosses, whichever 
comes first. 

ec. An urn contains 5 black balls, 8 white balls, and 13 green balls. A 
ball is chosen and the color is observed. 

d. In part c, 2 balls are chosen and the colors are observed. 

e. Two identical-looking dice are tossed, and the numbers that appear 
are recorded. 


11. The probability space of a loaded die is given by the following table: 


Two such dice are tossed. Construct the probability space for the outcomes. 
Find the probability of the sum 7. Of the sum 7 or 11. Find the probability 
that the high die is a 6. Find the probability that either a 6 or a 1 appears. 


12. When shopping for a car, Jack will buy brand X, Y, or Z with prob- 
ability .3, .5, and .2, respectively. The color of his car will be red, tan, or 
green with probability .7, .2, and .1, respectively. Assuming that color and 
brand are independent, construct an appropriate probability space for the 
descriptions of Jack’s future car. 


13. In Exercise 12 verify Equation 3.32 if d={X,Y} and B= {tan, 
green}. 


14. If n dice are tossed (n = 2), express the following events using the 
[A ],; notation. 
a. The first die isa4ora5. 
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b. The first or second die is a 6. 
c. The first or last die 1s even. 
Compute the probabilities of the events in parts a, b, andc. 


15. The following table is a table of the probabilities in the product space 
S X T. Fill in the missing entries. (Hint: In any product space, prove that any 
2 rows are proportional.) 


16. Suppose a coin has probability p of landing heads and probability 
q = 1—p of landing tails. The coin is tossed 4 times. Construct the probability 
space. (Hint: Let A = {H, T}. Find A?, then A? x A? = A?*.) 


17. An experiment has the probability space § = {a, b, c}, with p(a) = 
3, p(b) = .4, p(c) =.1. The experiment is run for as many times as neces- 
sary to obtain the same result twice. Construct an appropriate probability 
space for the possible outcomes. (Hint: Use a tree diagram.) 


18. Five cards are taken at random from a deck. Using Theorem 40, find 
the probability that 4 of the cards have the same rank, given that 3 have the 
same rank. Explain why this theorem cannot be used to find the probability 
that 3 have the same rank, given that 2 have the same rank. 


‘cHapTeR 4 MISCELLANEOUS 
TOPICS 


*] REPEATED TRIALS 


Suppose that an experiment has probability space S, and that A is an event 
of § with probability p = p(A). We let P, denote the probability that the 
event A will occur at least once if the experiment is repeated, independently, 
for n times. By Theorem 37 of Chapter 3, we have 


P,=1-—(1—p)" (4.1) 


The cases p = 0 and p = 1 naturally give P, = 0 and P, = 1, while the case 
n= 1 naturally gives P, = p. We shall therefore usually assume that n > 1 


and 0<p<i (4.2) 


In this section we shall be concerned with estimating the size of P,, for 
large values of n. To obtain the proper orientation, the reader should 
imagine p small and n large. Thus the event A will probably not occur ona 
given trial, but we are repeating the experiment many times. These tenden- 
cies tend to have opposite effects: 1 — (1 — p)” is near 0 if p is very small but 
is near 1 if n is very large. (Why?) We shall soon see that the behavior of 
P,, for large n is largely dependent on the value np. 


1 Example 


Ten people are in a room. What is the probability that at least 1 person was 
born on June 1? 
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We make the simplifying assumption that the probability of being born on 
June 1 is p = 3g5 and that the oe are independent. Thus 


Po —(l-gs)” = 1— (38)? 
If we compute P,, to 3 significant figures, we have P;,) = .0271. 


In this example np = 38 = .0274. The following theorem can be used to 
approximate P,, if np is sinall 


2 Theorem 
P,=1—-(1—p)" =np if np is small (4.3) 
More precisely, 
np — wp). <P,< np (4.4) 


In the above example, np = .0274. Thus .0274—(.0274)?/2 < Py < .0274. 
Since (.0274)?/2 < (.03)?/2 = .0009/2 < .0005, we have the (crude) approxi- 
mation .0269 < P, < .0274. This is good enough to give P, = .027 to 3 
decimal places. In general, the smaller np is, the more effective is the esti- 
mate for P,, given by Equation 4.4. 

Equation 4.3 may be made reasonable by using the binomial theorem. We 


have (1—p)" = 1-(i)p = 1—np if np is small. Thus P, = 1—(1—p)" = 


(1 —np) = np if np is small. We shall not prove Formula 4.4, because the 
proof would take us too far afield. 


3 Example 
If 500 people are in a room, what is the probability that at least 1 of these 
people has his birthday on June 1? 

As in Example 2, we have Pso9 = 1 — (1 — s5)°. Here p = agz, n = 500, and 
np = 2 = 1.370, and the approximation formula 4.3 or 4.4 is no help. A cal- 
culation using logarithms yields Pso) = .746. 

It turns out that there is a very useful approximation for P, when n is 
large, which we shall now indicate. We shall compute (1 +.x/n)" by the bi- 
nomial theorem and then we see that happens as n gets large. (In the above 
example x = —500/365 and n = 500.) We have 


x\" x n(n—1) ©  n(n—1)(n—2) ¥ 
(14) <1tn? n'2! 3! n 


| yn 
ee. $a 
nin" 


= txt TMM yo PU) 
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This expression suggests that we consider the infinite series 


x x3 
1+x Tay tayt oe 
because, when n Is large, 1/n ~ 0, 2/n = 0, etc. This series is one of the most 
famous in higher mathematics. It is proved in calculus texts that there is a 
certain number e = 2.71828 ... (named after the Swiss mathematician Euler) 
and that this series is the series for e*: 
r x 

e* = Itxtaytayt (4.5) 
The letter e is always reserved for this constant in much the same way that 
qm is reserved for the constant 3.14159.... A proof of Equation 4.5 is be- 
yond the scope of this text. Appendixes A and B give values for e* for differ- 
ent values of x, which we shall freely use. From the above analysis it seems 
reasonable to expect the approximation 


(1 +4) = et for large n (4.6) 


Equivalently, setting x/n = y, 
(1+y)" ~ em for large n and small y (4.7) 
We state without proof a more precise formulation of these approxima- 


tions which are analogous to Equation 4.4. 


4 Theorem. Approximation of (1+ x/n)" 
x*\ oe x\" x 
(1S er<(142)<er (x >0) (4.8) 


(1 -=)es < (1 -*)' <e* (0 <x< z) (4.9) 


In particular, if we call x = uw = ap in Equation 4.9, we obtain the approxi- 
mation 


2 
(1-H ew < (l—p)"<e*# (uw =np,0 < p<) (4.10) 


In Example 3 we had n= 500 and p=2z45. Thus p = np = 389 = 1.37. 
From Appendix A we find that e~!37 = .254. The inequality (4.10) shows 
that (1—p)” is less than .254 but greater than .254— (1.37)2/500 x .254 =~ 
253. Thus Pso = 1—(1—p)" ~1—e* = 1—.254 =.746, in agreement 
with the value already given. Using Equation 4.10, we have 

[fp < 4and w= np, then 

P, =1-e% (4.11) 


with error at most (y?/n)e. Inall cases, P, > 1—e™. 
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5 Example 


A tax auditor decides that he will audit any tax return whose tax bill ends 
with 33 cents. He has 200 returns. What is the probability he will audit at 
least one of these? 

It is reasonable to assume independence and to take p = .01 = probability 
of a return with a tax bill ending with 33 cents. Thus n = 200, p = .01, and 
w=np=2. By Appendix A, e*=e?=.135. Thus Poo ~ 1—.135= 
.865 = 86.5 percent. According to the inequality 4.10, Pyoo is larger than 
.865 with an error of at most e~’( u?/n) = (.135) (4/200) < .003. 


6 Example 


If an experiment has probability .001 of succeeding, how many times must 
this experiment be scheduled in order to ensure that it succeeds at least 
once, with probability larger than .95? 

We have P,, > 1—e™. Thus we may ensure P,, > .95 by making 1—e™* > 
95. This is equivalent to e* < .05. Refering to Appendix A, we have 
e~ 3-00 < .05. Thus we choose px = 3.00. Also, we have p = .001. Thus p = np 
implies that 3.00 = n(.001) and n = 3000. 

If 4 =np is large, the inequality (1—p)" < e* shows that, (1—p)” is 
very small, and hence P, = 1— (1—p)" is very close to 1. To see how this 
operates, the cases w= 4, 5, and 6 yields e-” = .018, .006, and .002, re- 
spectively, so P, > .982, .994, and .998, respectively. For w= 10, e* = 
.0000454, and P,, > .9999546 = 99.99546 percent. Thus, if we have 1 
chance in | million of succeeding (p = 10°), but if we try 10 million (in- 
dependent) times, (n = 107), then it is virtually certain that we will succeed. 
(w= 10,P, > .99995.) 

We are thus transported back to Chapter 1, where we imagined a large 
number of experiments and where we expected the relative frequency of an 
event A to be near the probability p = p(A). In particular, if p > 0, we ex- 
pected, at any rate, that the event A would occur if we had a large number of 
experiments. We shall now make this more formal. Since we do not wish the 
analysis to depend on the function e”, we shall give a direct and simple proof 
of this result. 


7 Theorem 


Let an event A have probability p > 0. In n independent trials, the prob- 
ability P,, that A occurs at least once satisfies the inequality 


i-lep,< 1 (4.12) 
np 


Also if € is any positive number, then for some positive integer N, 


l-e<P,< 1 ifn>N (4.13) 
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To prove this result, we shall estimate (1—p)". Since (1—p)"(1+p)" = 
(1—p?)" < 1, we have 


| 1 


(1-p)* < 7 = ——.—— 
(+p) 1+np+(5 )p*+- MP 

In this latter inequality we have ignored all terms except one, thereby obtain- 

ing a smaller denominator and larger answer. Thus 1 > P, = 1—(1—p)" > 

1—1/np. Also 1/np < € ifn > N = 1/pe. In this case P, > 1—1/np > 1—e. 

This completes the proof. 

The inequality 4.12 is extremely weak. For example, if 1 = np = 10, this 
inequality yields P, > 1—+, = .9, whereas we actually have P, > 1—e-!8= 
.99995. The weakness lies in the proof of Equation 4.12, where we replaced 
(1+p)”" by np. Still, the inequalities 4.12 and 4.13 are strong enough to 
show that for fixed p > 0, if nis large enough, P,, will be very close to |, and 
in fact will be as close to 1 as we want provided that n is large enough. 

The above results may be stated using the language of limits. We may 
state: If0 < p < 1, then (1—p)" ~ 0Oasn— ©, and P, =1—(1—p)"> 1 
as n— oo, (The symbol “—” is read “approaches.”) Theorem 7 may be 
paraphrased in the following interesting way. 


7’ Theorem 


If an event A has probability p > 0, and if the experiment is repeated in- 
definitely, then A will occur with probability 1. 


This statement is understood to be another way of stating that P, — 1 as 
n> o, 


EXERCISES 


Most of the exercises below need the values in Appendix A or B for 
numerical answers. 


1. When playing poker, the probability of picking up a flush or better is 
.00366. During one evening, Harold will play 100 games of poker. Approxi- 
mately what is the probability that he will pick up a flush or better during 
the evening? Use Equations 4.3 and 4.11 to obtain an approximation. Esti- 
mate how large your error is in each case. 


2. Each person ina class of 20 people is told to toss 3 dice 10 times and 
record the sum. The teacher is reasonably sure that some one will have 
tossed the sum 3 or 4. How sure is he of this? What is the probability that 
someone reports the sum 18? 
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3. How many times should a pair of dice be tossed in order that a double 
6 will occur with probability larger than 3? 


4, A printer will print a perfect page with probability .96. What is the 
probability of his printing a perfect 100-page book? 


5. A certain bank sends out all its statements automatically, using a 
machine. The machine will send out a wrong statement with probability 
.0007. The bank sends out 800 statements. What is the probability that a 
wrong statement was sent out? 


6. Using the tables of e7 and e~*, compute approximately: 
a. (1.03)° b. (.996)? 
c. (.999)1-200 d. (1.000123)%° 
In each case, estimate the error. 


7. In playing roulette, the probability of winning on any one play is 3. 
Approximately what is the probability of winning at least once if 4 games are 
played? If 25 games are played? 


8. How many games of roulette (see Exercise 7) should a person plan to 
play in order to win at least once with probability greater than .9? 


9. One hundred and twenty-five people each choose a number from 1 
through 100 at random. What is the probability that the number 72 was 
chosen? 


10. Last year there were about 1,500,000 marriages in the United States. 
Estimate the probability that for at least one of these couples, both partners 
had birthdates on June 24? Estimate the probability that for at least one of 
these couples, both partners had birthdays on February 29. State what 
assumptions are involved and if they seem reasonable. 


11. Ten dice are tossed. Estimate the probability that at least one 6 occurs. 
Give an upper and lower bound for the probability. 


12. If a person has probability .0001 of winning a certain lottery, and if he 
buys 1 ticket for 50 consecutive years, estimate his chances of winning 
during these 50 years. 


13. A person picks up 5 cards from a deck and finds that they are {5S He, 
6 Di, 9 He, J Cl, K Di}. He claims that a very rare event occurred, because 


the probability of drawing this hand is (2) = ,0000004. Yet no one is 


very surprised at this occurrence. Explain why not. Your explanation should 
also explain why most people will be impressed if his hand was {10 Sp, J Sp, 
Q Sp, K Sp, A Sp}. 
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14. Each person in a class of 100 people tosses 10 coins. Approximately 
what is the probability that at least 1 person tosses 10 heads or 10 tails? 
What ts the probability that at least 1 person has a 10—0 or a9—1 distribu- 
tion of heads and tails? 


15. Using Equation 4.5 find the value of 
a. e! b. e7 
c. e=e!}? d. 1/e =e} 
Compare with the values in Appendixes A and B. 


*2 INFINITE PROCESSES 


If a sample space or an event contains infinitely many outcomes, many of our 
previous techniques are not applicable. However, by approximating by 
finite sets, and by taking limits, it is possible to obtain a satisfactory notion of 
probability for these events. For example, if an event has probability p > 0 
of occurring, we have considered in Section 1 the probability that the out- 
come will eventually occur if the experiment is repeated indefinitely. Our 
method was to consider P,, the probability that the event occurs at least 
once during nv trials. If the original finite sample space was S, we thus con- 
sidered $” (a finite space), and we found that P, = 1— (1—p)". Finally, the 
probability of eventual occurrence was taken to be I, because P, — | as 
n— ©, (See Theorem 7 or 7’.) This is a reasonable answer, because, for 
example, we may arrange to have P,, > .99999 by choosing n large enough. 
More generally, for any e > 0, no matter how small, we have 1 —e < P,, < 1, 
provided that n is large enough. 

In this section we shall consider several examples involving infinite pro- 
cesses, and we shall find probabilities by using analogous limiting techniques. 
Before doing so, we state without proof some useful results about limits. 


8 Definition 


Let a, be a number for every positive integer n. We say that a, — a (read: 
a, approaches a) as n —> ©, or that the limit of a, is a if, for any e€ > 0, there 
is a number N with the property that for all n > N, we have a—e <a, < 
ate. 

Briefly, for large enough n, a, is within the range ae. Informally, a, ~ a 
for large n. The number ¢ is the accuracy with which a, approximates a and 
it may be arbitrarily prescribed as long as it is positive. The number N 
(depending on e) tells us how far out in the sequence we must go before we 
are within € and stay within e, of the limit a. 
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9 Theorem. On Limits 
Ifa, > aandb, — basn— ©, then 


a,+b,—2atb (4.14) 
An — by > a—b (4.15) 
A,Dn > ab (4.16) 
a,lb,— alb (if b, # Oandb # 0) (4.17) 
Ca, > ca (for any constant c) (4.18) 


The following result is useful and intuitively clear. 


10 Theorem 

If a, iS increasing with n, and if a, < K for fixed K and all n, then a, — a for 
some value a < K. Similarly, if a, decreases with n, and if a, = L, then 
a, > aforsomea 2 L. 

In this theorem the word “increasing” means “a, < a,4, for all n.”’ Thus 
we permit some or all of the terms to be stationary. Similarly, the word 
“decreasing” means “a, = d,4, forall n.” 

In many of our applications a, will be a probability. Thus 0 <= a, < 1 and 
we may choose K=1 or L= 0. The limit a will therefore also satisfy 
0O<asl. 

Finally, we may restate the result proved in Theorem 7 and generalize it to 
include some negative values. 


11 Theorem 


If—1<a< 1,thena"— 0. 
We recall the result of algebra concerning the infinite geometric series 
with ratio r: 
atartar+- ++ tart. =r if-l<r<1 (4.19) 
By this we mean that the sum of n terms approaches a/(1—r) as n > ©. In 
fact, 


atart+:::+ar"™} = (4.20) 
and we may take limits using Theorem | 1. Since r” — 0, the limit is a/(1 —r) 
We now show how some of these results can be used to find probabilities. 


12 Example 
Alex and Bill alternately toss a coin. The first player to toss a head wins. 
Alex goes first. What is each player’s probability of winning? 

We shall give 2 methods for solving this problem. 
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Explicit Solution. Alex wins if the event A= {H, TTH, TTTTH,...} 
occurs. Thus the probability that Alex wins is p(A) =3+¢+3s+:---. This 
is an infinite series with ratio r= 4. By Equation 4.19 we have 


} 3? 
PLA) = F493 
In the same way, Bill wins if the event B = {TH, TTTH.,...} occurs. This 
probability is4+76+-:-- = 4%, again by Equation 4.19. 


The use of infinite series to evaluate this probability suggests that a limiting 
probability is involved. This may be seen as follows. If we limit the game to 
at most n tosses for Alex, the probability that he wins can be seen to be the 
finite sum 3+¢+-:-::+1/2-4""!. The infinite series is the limit of this sum as 
n—> ©, 


Indirect Solution. Let x= probability that Alex wins when it is his turn 
to toss, and let y = probability that Alex wins when it is Bill’s turn to toss. 

Then, when Alex tosses, he can win by tossing H (probability 3), or by 
tossing T (probability 3) and then winning (probability y). Thus x = $+3y. 
Similarly, if it is Bill’s turn, Alex can win by Bill’s tossing a tail (probability 
3) and then winning on his turn (probability x). Thus y = $x. (See Fig. 4.1 for 
an appropriate tree diagram for these equations.) Thus we have 


Solving simultaneously, we obtain x = %, y= 3. Thus Alex’s chances of 
winning is §. In a similar way, we may find that Bill’s chance is 3. 


j H A wins 1 


A’s turn B’s turn > T —— ix 
(A wins on 
\ A’s turn) 
2 y 1 
(awinson ” y 
B’s turn) 
x=3+2y y= 3x 
How A Wins on His Turn How 4A wins on B’s Turn 


4.1 Tree Diagram for Coin-Tossing Game 


Here x and yare also limiting probabilities. To see this, suppose we restrict 
the game to n tosses. Then we have x, and y, as the probability that Alex 
wins if it is his or Bill’s toss, respectively. Now, once a player tosses a coin 
and a tail occurs, they are in the (n— 1) toss game. Thus we have 


Xn = 3+3Yn-1 Vn = 2Xn-1 (4.22) 
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Now, it seems clear that increasing n cannot decrease Alex’s chances of 
winning the game. Thus x, and y, are increasing, and we are assured that 
x, and y, have limits by Theorem 10: x, — x, y, —@ y. Finally, taking limits 
in Equation 4.22, we arrive at Equation 4.21. The infinite game was easier to 
handle, because at any unfinished stage of the game, the remaining game was 
always one of two possibilities (Alex’s or Bill’s turn) and did not depend 
upon how many tosses were left. In similar problems we shall use this infinite- 
game technique. The justification will be similar to the above remarks. 


13 Example 


Players A and B play a game in which each has probability 2 of winning. 
When any player wins, he wins a penny from the other. They keep on playing 
until one of the players has no coins. Assuming that A starts with 3 cents and 
B with 7, what are their respective chances of winning everything? 

If we consider the number n of coins which A has, we see that at any stage, 
n changes to n—1 or n+1, each with probability 3. This continues until 
n=O or 10, at which point the game terminates. We visualize this as a 
random walk. At any stage, we toss a coin to determine whether we go left 
or right for one unit. This continues until we reach the edge (n = 0 orn = 10). 
(See Fig. 4.2.) 


1 1 1 
2 2 2 2 


Q0——1——_2_3_4—_5—-6_7—-8—_9—-10 
4.2 Random Walk 


We let x, be the probability that A will win everything if he has n coins. 
(Geometrically, he is at point n.) Then we have 


Xo = 0 X10 — 1 (4.23) 
For 0 < n < 10,a simple tree diagram shows that 
Xn = 2Xp-1 + 2Xn41 (0<n< 10) (4.24) 


This system of equations constitutes 9 equations in 9 unknowns. We now seek 
its solution. To find it, we use a trick. Add —3x, —3x,-, to both sides of 
Equation 4.24, to obtain $x, —$Xn_1 = 2Xn41— 2X. Thus 

Xn4i 7 Xn = Xn —Xn-1 (n=1,2,...,9) (4.25) 


This is the system x, —X%p = %_.—X; = X3—X,_ =***' =Xy—Xg. Setting each 
of these numbers equal to k, we obtain 


X, =X tk, X= xX, +k =x) t2k, x3 = Xp tk =X) +3k,..., X19 = Xp + 10k 
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Using Equations 4.23 in the equation x,) = x) + 10k, we have 1=0+ 10k. 
Thus k= j,, and x, =x) tnk=0+n/10=n/10. Finally, the probability 
that A wins is x3 = 3. It is also seen that the probability that B wins is x, = =. 
Thus the game terminates with probability 1. 

All these probabilities are to be interpreted as limits, as in Example 12. 
Note, however, that an explicit solution of this problem as a sum of an infinite 
series 1s not easy. It is easily seen that this example generalizes to an initial 
fortune of a for A and b for B. The probabilities of winning everything are 
proportional to the fortunes of A and B: a/a+ bfor A and b/a+b for B. 


14 Example 
In the above example, suppose A has probability p of winning a game and 
that B has probability g=1—p of winning. Again suppose that A starts 
with 3 pennies and B with 7, that the winner collects | cent after any win, 
and that the game terminates when one person has nothing left. What are the 
respective chances of winning everything? 

The analysis is very similar to Example 13. Setting x, = probability A will 
win everything if he has 1 pennies, we have again 


Xo = 0 X19 = l (4.23) 
However, this time the equations connecting the various probabilities x,, 


are 
Xn = PXn41t+ GXn-1 O<n< 10 (4.26) 


To solve these equations, subtract px, from each side of Equation 4.26 to 
obtain 


Xn — PXn = P (Xn41 ~~ Xn) + GXn-1 
qXn = P(Xn41 _ Xn) + GXn-1 


q(Xn —Xn-1) = P(Xn+1 —Xn) 


Xn+1—Xn = 9 (tn =n) (0<n< 10) (4.27) 
Now set r= q/p and x, — x) = a. Then if we take n= 1,2,..., kin Equation 
4.27, we obtain 
Xy—X =a 
Xo —X4 = Va 


X3—X,=r(x,—x,) =r’a 


Xp Xp-1 uM +e -+- == rk-1q 
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Adding these equations, we have 


— yk 
Xp— ty =a trtes tre) =a (4.28) 
Recalling that x) = 0, x19 = 1, we find 
— 1-r 
l=a l-r 
_ I|l-r 
~ J-re 


Substituting in Equation 4.28 and using x) = 0, we finally obtain 


__ pk _ 
Xp = ai ( = , = 1-2) (4.29) 
In our problem k = 3, so we had x, = (1—r?)/(1—r'®), where r= q/p. The 
special case p = 2 leads to r= 1, and in this case we are covered by Example 
13, with x, = k/10. The numerical results of Table 4.3 illustrate the relative 
importance of the size of p in comparison to how far behind a player is. 
Intuitively, if p > .5, there is a tendency in a random walk to creep to the 
right, while if p < .5, the drift is to the left. 


4.3 Probability x, (to 3 Decimal Places) of Winning All, if Player Is Behind 
3 to 7 but Has Probability p of Winning on Each Game 


The method of this example clearly generalizes if there are N coins in 
circulation. We then have 


1—r* q 

Xe = Toy r=s (4.30) 
k 

Xt = Hy (if p = 3) (4.31) 


If we let N — ©(withk fixed), we are in the position of playing an infinitely 
rich opponent. There are 3 cases: If p < 3, then r > 1. Dividing numerator 
and denominator in Equation 4.30 by r’, we have 


— C/r®¥) — OI rt-*) 
Ke py) — 1 
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Thus x, > 0 as N > ». If p =%, we use Equation 4.31 and again x, — 0. 
But if p > 4, then r < 1 andr” — 0. Thus x, > 1—r*. If we take comple- 
ments, we have 1—x, = y, =r* =the probability of a player eventually 
losing his fortune. We summarize: Suppose in a random walk a player 
moves one unit right with probability p, and one unit left with probability 
q=1-—p. Suppose he starts at k > 0. Then the probability that he will 
ultimately pass through n= Ois 1 if p = $ but is (q/p)* ifp > 3. 

We conclude this section by theoretically tying up the idea of the con- 
ditional probability p(A|B) with the experimental idea. 


15 Theorem 
Let A and B be events with p(B) > 0. Let the experiment S be repeated 
indefinitely. Then the probability that A occurs when B first occurs is 
p(A|B). 

For a proof, let us restrict the number of experiments to at most n. Let 
T,, be the event (of S”) that B occurs, and that A occurs also when B first 
occurs. Set a= p(A M B), c= p(B) and set x, = p(T,). The tree diagram 


AN B————>> T, occurs: a 


a 


B—(AN B) —> T, cannot occur: 0 


c 
B——>———  T,,_, occurs: CXn—1 
Xn = A+ CXn_1 


4.4 Tree Diagram forT, 


of Fig. 4.4 illustrates the two mutually exclusive ways in which 7, can 
occur, depending on what happens on the first experiment. Thus 


Xn = At+CXn-1 (4.32) 
Since increasing nm cannot decrease x,, we have x, — x by Theorem 10. 
Hence, taking limits in Equation 4.32, we have 


x=a+cx 


_ a _P\ANB)_p(ANB) _ 
*~T=c 1—-p(B)_—D(B) = P(AlB) 


1 The probability of eventually passing through n = 0, starting at n = k should be interpreted 
as the limit of P;,7, the probability of passing through n = 0 in T or less steps, as T > %. The 
above derivation did not give this. Thus this derivation must strictly be regarded as heuristic, 
although it can be patched up. 
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Since x is simply “probability that 4 occurs when B first occurs,” we have 
the result. 

One way of looking at this theorem is to imagine an experiment S and two 
mutually exclusive events A and B. We then have arace: If the experiment is 
run repeatedly, which event will occur first, A or B? To find the probability 
that A wins, we find the probability thatA occurs when A U B first occurs. 
Thus, by Theorem 15, the probability that A occurs first is p(A|A U B) = 
P(A)/(p(A) +p(B)). Similarly, the probability that B occurs first is 
P(B)/(p(A)+p(B)). For example, suppose we toss 2 dice repeatedly. 
What is the probability that the sum 9 occurs before the sum 7? Here 
p (sum 9) = 36, p (sum 7) = 3s. Hence p (sum 9 before sum 7) = #/(4+ §) = 


4 — 2 
10 — 5: 
EXERCISES 


1. Suppose that players A and B have probabilities p and p', respectively. 
of hitting a target. They alternate shooting at the target, with A going first. 
What is the probability that A hits the target first? That B hits the target first? 


2. In Exercise 1 prove that player A has a better chance than B of first 
hitting the target if and only if p > p'/(1+p’). Similarly, prove that A and B 
have equal chances if and only if p = p'/(1+p’). 


3. Suppose in Exercise | that A has two chances at the target to every 
one that B has. Suppose p = ¢ and p’ = 3. Who has the better chance of first 
hitting the target? 


4. Charlie and Doug play the following game. One of the players tosses a 
die. If it lands 5 or 6, he wins; if it lands 1, he loses; and otherwise, it 
becomes the other player’s turn. What are Charlie’s chances of winning if he 
starts the game? What are his chances of winning if the players toss a (fair) 
coin to decide who goes first? 


5. Ed has $3.00 and he intends to gamble incessantly until he wins $5.00 
or goes broke. Suppose that the probability of his winning a game is p = .51. 
Find the probability that Ed wins $5.00 if he bets $1.00 per game, and also if 
he bets $.10 per game. Find these probabilities for p = .50 and for p = .49. 
(Note: Equation 4.10 and Appendixes A and B may be helpful.) 


6. In Exercise 5, with p = .51, suppose Ed decides to play indefinitely or 
until he goes broke. What is the probability that he will go broke if he starts 
with $3.00 and bets $1.00 per game? What if he bets $.10 per game? 


7. Fred is tossing coins and is keeping a count of heads and tails. He 
finds that he has 100 more heads than tails. He claims that there is an 
excellent chance that heads will always be in the lead even if he continues 
forever. Discuss this claim. 
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8. The game of craps is played according to the following rules. Two dice 
are tossed and the sum S is observed. If § = 2, 3, or 12 the player loses. If 
S = 7 or 11, the player wins. But if S = 4, 5, 6, 8, 9, or 10 (called the players’ 
‘“‘point’’) the player continues to toss the dice until his point is tossed again or 
until a 7 is tossed. If his point is tossed again, he wins; but if a 7 is tossed, he 
loses. Find the probability of winning. 


9. Following Equation 4.22, it was stated that it seemed “clear” that x,, 
and y, do not decrease with n. Explain that statement. 


10. In Fig. 4.5 a particle starts at vertex A and moves along 1 of the 3 
paths leading from A, each with probability 3. Similar random movements 
occur at B and at C when the particle is there. The walk stops at_X, Y, or Z. 
What is the probability the particle ends up at X? At Y? At Z? 


BoA B 


x Y 
4.5| Random Walk About a Triangle 


11. Three players, Xaviar, Yetta, and Zelda, play a fair game in which one 
of the players loses a penny to another, while the third does not gain or lose 
in this transaction. Suppose Xaviar, Yetta, and Zelda start with 1, 2, and 3 
pennies, respectively, and that they keep playing until one person has no 
money left. What is the probability that Xaviar loses his money first? What is 
the probability that Yetta loses and that Xaviar has 3 or more pennies after 
Yetta loses? (Hint: Plot the fortunes of Xaviar and Yetta as a point in the xy 
plane and take a random walk.) 


*3 COINCIDENCES 


A dictionary? definition of “coincidence’”’ is ‘‘a group of concurrent events 
or circumstances remarkable for lack of apparent causal connection.” In this 


2 Webster's Seventh New Collegiate Dictionary, G. & C. Merriam Company, Publishers, 
Springfield, Mass., 1965. 
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section we shall consider probabilistic explanations of certain types of co- 
incidences. 


16 Example 


Twenty people independently choose, at random, an integer between | and 
100 inclusive. What is the probability that 2 of the people choose the same 
number? 

Let p be the required probability. It is easier to compute g = 1 — p = prob- 
ability that all numbers are different. There are jo9P29 ways of choosing 
different numbers, and 100*° ways of choosing the numbers, all equally likely. 
Thus 

— 100799 _ 100: 99-98---(100—19) _:100! 

10029 — 100?° ~ 100280! 
A computation using logarithms gives g = .130. Therefore, p = 1— q = .870. 
Thus the “coincidence” of the same number appearing twice has a rather 
high probability of occurring. In a practical situation, unless they know how 
to choose numbers at random, people usually will choose special numbers. 
For example, they might tend to choose odd 2-digit numbers, making a 
coincidence even more likely. 

We may write the expression for q as a product as follows: 


_ (100—1) - (100-2) --- (100— 19) 
q~ 100% 
= (1—aibs) * (1—aé0) « «(1 —i05 
This expression for g may also be obtained directly by repeated application 
of the product rule (Section 3 of Chapter 3). 


More generally, if r objects are chosen independently and with replacement 
from among n, the probability that no repetitions occur is 


n n n n 

Thus the probability that the same object is chosen twice is 
—~,—(,;—1\(,_2)...(, 7 
p=1-(1-a)(tma) (1) 


In analogy with Equation 4.7, we state without proof the following approxi- 
mation for p. 


17 Theorem 
For any nandr > 1, 


p= 1-(1-5)(1 -*) _— (.-—) ~ | — enrr-1ylan (4.33) 


n 
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Inallcasesp > 1—e "eV", 

We shall not concern ourselves with an estimate of the error in using 
Equation 4.33. Table 4.6 gives values of p and its estimate for various values 
of r and for n = 100. Further computations show that for r= 13, the prob- 
ability of a match is greater than .5. 


4.6 Probability p That When r Numbers Are Chosen at Random 
from n= 100 Numbers, 2 Will Be the Same 


5 10 15 20 25 
| | 
O95 | .362 | .650 | .850 | .950 | . 


Estimate for p 


It should be noted that the coincidence probability p is roughly dependent 
upon the square of r and inversely dependent on n. Thus, if n is increased, 
it is not necessary to increase r proportionately to achieve the same prob- 
ability of a match. For example, when n = 100, we found that r = 20 yielded 
p > .85. If nis increased to 1,000, it is not necessary to increase r to 200 to 
yield p > .85. For if n = 1,000, we can find p > .850 by finding e7"7— 1/2000 < 
.150. Appendix A shows that this can be achieved if r(r— 1)/2,000 > 1.9, be- 
cause e 1-99 = 1496. Thus r(r—1) > 3,800. It suffices to take r = 63. Thus if 
63 numbers are chosen at random from 1 to 1,000 inclusive, two of these 
numbers will be identical with probability larger than .85. Similarly, if we 
want a probability larger than .99, it suffices to take r = 98. 

We can see this coincidence occurring in an everyday situation. If 30 
people are gathered in a room, what is the probability that at least 2 of them 
have the same birthday? Making our usual prohibition against February 29, 
assuming all birthdates are equally likely and assuming independence of 
birthdates, we are in the situation discussed above with n = 365, r= 30. 
Thus r(r—1)/2n = 30: 29/2 - 365 = 1.19. Using Appendix A, we have p = 
1—e 1-9 = 1 — .304 = .696. The chances are favorable that this coincidence 
occurs. It turns out, in fact, that for r = 23 there is a better than even chance 
that 2 people have the same birthday. For r = 50, using Equation 4.33, we 
find that p > .975. 

Before considering another type of coincidence, it is convenient to intro- 
duce the following interesting generalization of Theorems 6 and 37 of 
Chapter 3. 
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18 Theorem. The Inclusion—-Exclusion Principle 
Let A,,...,A, be events in a sample space S. Let 


$= > plAi) 


so = > P(A; N Aj) 


i<j 


S3 = S p\A; ‘a A; ‘a A,) 


i<j<k 


S-=p(A, Ne: A A,) 
Then 
p(A, U-+:: UA,) =8,—5,.+53,—-°° 5, (4.34) 


The idea of the equation is that s,; = p(A,) includes all the probabilities of 
sample points in A, U ---: U A,. However, this sum may count some points 
twice. Thus we subtract s, = % p(A; N A;), which includes all probabilities 
counted more than once in s,. However, we may have subtracted too much, 
SO We continue the process with 53, etc. 

To prove Equation 4.34 in geiral, we choose a sample point s in A, U 
‘++ UA, and find out how many times (positive, negative, or zero) it is 
counted in the right-hand side of the Equation 4.34. Suppose that the sample 
point s is in exactly k of the sets A; (k > 0). Then the probability p(s) is 


counted k times in s,, (5) times in s, (once for every pair of these k sets), (3) 


times in s;, etc. Thus p(s) appears a total of 


(+6) =) 


times in the right-hand side of Equation 4.34. But this sum is exactly 1, 
because, by Equation 2.17, 


1-()+(:)---#(;)=0 (k= 1) 


Thus p(s) appears exactly once in the right-hand side of Equation 4.34. This 
completes the proof. 

The method of proof may be illustrated with the help of Fig. 4.7. Here 
a,b,c,...,k,l represent probabilities of the indicated points. Thus equation 
4.34 becomes 


at+b+ctdtetftgthtit+j 
=(at+tb+ct+dte)+(dtetft+tgth)t+(ctetgthti+j) 
—[(d+e)+(cte)+(et+gth)] 
+e 
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It is seen that the term e corresponds to the case k = 3 and it occurs exactly 
3—3+1=1 time. Similarly, g occurs 2—1= 1 time. This corresponds to 
k=2. 


4.7 Inclusion—Exclusion Principle 


Equation 4.34 may also be used for counting. If.$ contains k elements and 
if a uniform sample space is taken, then we have n(A) = p(A)k. Thus, 
multiplying Equation 4.34 by k, we obtain 


n(A, U:++U A,) = ny — ng tng—+ ++ =n, (4.35) 
where 
n= > n(A;) 
Ny = > n(A, MN A;), etc. 
19 Example 


How many 3-digit numbers are there which begin with the digit 5, end with 
the digit 7, or use 3 consecutive digits in some order? 

We let S be the set of 3-digit numbers. Then define B, = the set of num- 
bers of S which begin with 5; E, =the numbers of § which end with 7; 
Cons = the set of numbers that contain 3 consecutive digits. Then a simple 
counting argument gives 


n(B;) = 10 x 10 = 100 
n(E,) =9x 10 = 90 
n(Cons) =8X3X2xX1—2= 46 
(A number of Cons is determined by the lowest digit appearing in it—0, 
1,..., 7—and then the locations of the digits. The two cases 012 and 021 
are then eliminated.) Thus 
n, = 100+ 90+ 46 = 236 
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Also, 
n(B, N E,) = 10 
n(B, (1 Cons) =3X2=6 
n(E, M Cons) =3X2=6 
Thus 
n, = 10+6+6 = 22 
Also, 
nz =n(Bz 1 E, M Cons) = 1 
Thus 


n(B, U E, U Cons) = 236—22+ 1 = 215 


Thus there are 215 such numbers. 
We may use Theorem 18 to solve an interesting problem in coincidences 
which goes back to the eighteenth century. We first illustrate a special case. 


20 Example 
Ten balls (numbered 1 through 10) are placed at random into 10 boxes 
(numbered | through 10), 1 ball to a box. What is the probability that at 
least 1 of the balls occupies a box of its own number? (If ball i is in box i we 
call this a match.) 

We let A; be the event that ball numbered 7 is in box number i (7= 1,2,..., 
10). Then we wish to find p(A, U A, U--: U A,). 

To find p(A,) we note that ball 1 has probability =, of landing in box 1. Thus 
p(A,) = x. Similarly, p(A,;) = 4. Thus 


S, = p(A,) +:+-+p(Aio) = 10(45) = 1 


To find p(A, NM A.) we use the multiplication principle to obtain p(4, M 
A.) = p(A,)° p(A42|A1) = Gol). Similarly, p(4; N A;)= 1/10-9 for i<j. 


Since there are exactly (5) = 10: 9/2: 1 pairsi,j withi <j, we have 


_ — 1 10-9 1 
= & PAN A) = 199° 97 =) 


In the same way we find p(4; N A; NM A;) = 1/10: 9-8 and that there are 


10 
(3) =10-9-8/3-2- 1 triples i <j < k. Thus 


1  10-9-8 
S3 = (4; A; NA = . 
° eP i Ak) = 7979-8 3-2: 


i 
3! 


—" 


Continuing in this way, we find 


l 
Sk ey 
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Finally, using Equation 4.34, we have 


~1l t,t... 1 
P(A, U-- UA) = 74-544 357 roi = -6321 
Thus at least one ball is in its own box with probability almost §. 
It is easily seen that this example can be generalized as follows: 
If n objects are rearranged at random, then with probability 


p,=1—-+4t-...41 (4.36) 


at least one of the objects will occupy its original position. 

A calculation of p, from Equation 4.36 shows that p, changes very little 
once n is large enough. For example, p, = .6321 to 4 decimal places for all 
n = 7. (See Table 4.8) 


4.8 Values of p, (to 4 Decimal Places) 


we see that p, ~ 1—e lasn— ~: 
py ~ 1-4 = 632120... (n large) (4.37) 


As noted above, for n as small as 7, 4-place accuracy is achieved by this 
approximation. 

We can find p, experimentally with cards. Shuffle a deck well. Then, from 
the top of the deck, count off the cards, one by one, while calling off the cards 
in some definite order, say the ace through king of spades, followed by ace 
through king of hearts, etc. Then the probability of a match is ps, = .63212. 
(The error in approximating ps. by 1 — 1/e is smaller than 10-®°. We may use 
Equation 4.37 unhesitatingly.) Many people consider that a correct call is 
a remarkable coincidence. 

Similarly, if letters are addressed to 10,000 people and if these letters 
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are stuffed into envelopes and mailed at random to these people, then with 
probability .63212, one of these people will receive a letter addressed to him. 

Using Equation 4.36, we may find the probability that exactly r matches 
occur in random rearrangement of n numbers. We let P,;,; denote the prob- 
ability of exactly r matches. (Here n is given.) Clearly r= 0 is the comple- 
ment of the event “‘at least one match.” Thus Pp) = 1—p, = 1—1/1!4+ 1/2!— 
-+++1/n! ~ 1/e. Again, we set A; = probability that the ith number is a 
match. As before, p(A;) = 1/n. The probability that the number i and only 
the number i is matched is, by the multiplication principle, (1/n)(1— p,»_). 
Since there are n numbers, the probability of exactly one match is n(1/n) 
(1—pp-1) =1—pp_-;. Thus Pyy=1—1/1!+--:*1/(n—1)!. Similarly, 
the probability that only i and j (i<j) match is [1/nm(n—1)](1 — py_2). 


Since there are exactly (5) = n(n— 1)/2! pairs i and /, we find that 


_n(n-1) 1 _ _!i/,_ 1 1. I 
Pa aa | Pr) =5)(1 2 Gopi 


In a similar way, P,,; may be found. We summarize in the following theorem. 


21 Theorem 
If n numbers are rearranged at random, the probability P,,,; of exactly r 
matches is given by the equations 


Py = l-aytay— ty = 4 = 36788 
Pu=q(I-qtan =o) ~t.1- 36788 
Pa= alata , 7p) ~S os 18394 
(4.38) 
Pin = (Iara “Toni ar 


nn! 


The approximations are valid when x 1s large. 
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EXERCISES 
Wherever applicable, use the approximation formulas 4.33 or 4.37. 


1. Each person in a class of 10 people is told to go home and choose 1 
card from a well-shuffled deck of cards. Approximately what is the prob- 
ability that 2 of these people will choose the same card? 


2. How many people should be chosen at random from a telephone book 
to be sure, with probability greater than 95 percent, that 2 of these people 
will have the same birthday? 


3. Twenty people each choose 3 different integers at random from | 
through 10 inclusive. Approximately what is the probability that 2 of these 
people choose the same set of integers? 


4. Using Equation 4.35, find how many 4-digit numbers have different 
digits, and either have 1 as first digit, 2 as second digit, 3 as third digit, or 4 
as fourth digit. (Hint: Take S = set of 4-digit numbers with different digits.) 


5. Answer Exercise 4 if the numbers need not have different digits. 
(Hint: In this case the digits are independent, and Equation 3.26 may be 
used.) 


6. Three dice are tossed. What is the probability that a 1, or a 6, or three 
consecutive numbers turn up? 


7. How many integers from | through 9,000 are divisible by 6, 10, or 45? 


8. Two people, independently and at random, choose 3 different integers 
from 1 through 10. What is the probability that some number will be chosen 
by both people? [Hint: Let A; be the event “i is chosen by both people.” 
Find p(A, U--+: U A, ) from Equation 4.4.] 


9. As in Exercise 8, suppose two subsets of size r are chosen indepen- 
dently and at random from a population of size n. Show that the probability 
that the sets have nonempty intersection 1s 


(i) _G) Gy 
MZ M27 M37 
n n n 
(i) G) () 
In particular, prove that the above expression is equal to 1 ifn/2 <rsn. 


10. By considering the complement event, show that the probability in 


146 ELEMENTARY PROBABILITY THEORY 


Exercise 91s 1 — (>) / (")]. Hence prove that 
"77)_) (i) GY) 
r}_\O) MY M2) (4.39) 
n n n n n 
() @ GO 
11. A class of 30 takes seats at random in a classroom with a seating 
capacity of 30. The teacher then reads off her prearranged seating plan 
(also random). 
a. What is the probability that everybody has to move? 
b. What is the probability that exactly 3 people will not move? 


c. What is the probability that somebody does not move? 
d. What is the probability that 4 or more people will move? 


*4 SYMMETRY 


The notion that something is true “by symmetry” is often taken to be a 
rather vague, largely intuitive idea, whose purpose is to avoid a really mathe- 
matical argument. However, symmetry is a valid, rigorous, notion which 
appears in many parts of mathematics, and in this section we go into its 
applications to probability theory. 


22 Example 
If 5 dice are tossed, the probability that at least two 6’s occur is .196. What 
is the probability that at least two 3’s occur? 

Clearly, the answer is also .196 “by symmetry.” By this we mean that the 
roles of 3 and 6 are interchangeable in any consideration of this problem. 
Any derivation of the probability .196 can as well be done if the number 6 
is replaced by 3. 

To generalize this notion of symmetry, it is required to consider substitu- 
tions of | sample point for another. In the above dice problem we would 
substitute 6 for 3 and 3 for 6 in all sample points. This procedure is formal- 
ized by the notion of a transformation, which may be defined on any set. 


23 Definition 
A transformation of a set A is a method of assigning to every element x of A 
an element x’ of A such that 
a. For every x in A, there is one and only one y = x’ corresponding to it. 
b. For every y in A, there is one and only one x in A such that y = x’. 
When we have a particular transformation, we indicate it informally by 
an arrow: x — x’. If we wish to name the transformation, we use func- 
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tional notation and write x’ = f(x). In Example 22 it was convenient to 
consider the interchange of the outcomes 3 and 6. We thus have the trans- 
formation of Fig. 4.9. 


4.9 Interchanging 3 and6 


4, J dd bod 


Ws <— ON 


In functional notation we may write 


f(3) =6 
f(6) =3 
f(x)=x forx = 1,2,4,5 


Remark. A transformation of a set A is also called a permutation of A. 
This is seen to coincide with our previous notion (p. 54) if A is a finite set. 
In Fig. 4.9, for example, we may regard the interchange of 3 and 6 as the 
rearrangement 1 2 6 4 5 3, or as the ordered sample (1, 2, 6, 4, 5, 3) with- 
out replacement from the set {1, 2, 3, 4, 5, 6}. 

In the language of set theory, a transformation of a set A is a one-to-one 
mapping of the set A onto itself. 

We can now define a symmetry. 


24 Definition 
Let S be a probability space and let s’ = f(s) be a transformation of S. We 
say that fis asymmetry of S if p(s) = p(s’) for all sample points s. 

In brief, it is required that sample points which correspond to each other 
shall have equal probability. 

If S is auniform space, then clearly any transformation of § is a symmetry. 
But if S is not a uniform space, the symmetries are more limited. In all cases, 
however, our intuitive idea that two sample points s and t play the same 
role, is equivalent to the more precise idea that some symmetry makes s 
correspond to t. 

If fis asymmetry of S, then fmay be extended to asymmetry of S X S and 
in general to S$”. We merely define f(x,,...,x,) = (f(%)),...,f(X,)). For 
example, we may use the transformation of Fig. 4.9 and apply it to all sample 
points of Example 22. Thus we would have (4, 6, 2, 6, 3) — (4, 3, 2, 3, 6). If 
fis asymmetry of S, then subsets 4 of S can be made to correspond using f. 
Thus we define f(4) = the set of all x’, where x € A. Briefly, we replace 
each x of A by x’ = f(x). We also write A’ = f(A). 
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25 Theorem 
Let s’' = f(s) be a symmetry of the sample space S$. Then if A 1s any event 
of S, 

p(A') = p(A) (4.40) 


The proof is immediate. We have 


p(A’')= > p(s’) [definition of p(A’)] 


= ¥ p(s’) (definition of A’) 


sEA 

= ¥ p(s) (definition of symmetry) 
sEA 

= p(A) [definition of p(A )] 


To see how this theorem is used, let us reconsider Example 22. If A = 
{1,2,...,6} is a uniform space, then the interchange 6 > 3,3 61s a 
symmetry. This clearly defines a symmetry on S = A®. If B is the event “at 
least two 6’s,” then B’ is the event “at least two 3’s.”’ Thus p(B) = p(B’) by 
Equation 4.40. For the remainder of this section, we offer other illustrations 
of Theorem 23. 


26 Example 
Five dice are tossed. Show that the probability that the sum is 13 is the same 
as the probability that the sum is 22. 

If A = {1,2, 3, 4, 5,6} is the sample space of 1 die, we set up the sym- 
metry 


N<— — 
‘ne NO 
<_— 
<_— 
<— 
<_— 


In general, x’ = f(x) = 7— x. This symmetry then extends to A?. If (x,, x2, Xs, 
X4,X5) is in the event “sum 13,” we have x,+x,+x3;+x,+x; = 13. Then 
Xite ss txsg =(7—x,) +°°°4+(7—x5) = 35 — (4, ++ + + + x5) = 22. Converse- 
ly, ifxj+-:-:-+.xs = 22, we obtain x, +--:+x, = 13. Thus 


(sum is 13)’ = (sum is 22) 


By Equation 4.40 we have the result. 

This symmetry is particularly easy to visualize. On the usual die, the 
opposite faces add up to 7: (4, 3), (5, 2), (6, 1) are opposite faces. Thus the 
symmetry above has the effect of viewing 5 dice from “under the table.” 
Since the “over plus under’’ sum is 7 for 1 die, it is 35 for 5, and hence the 


3 In any such correspondence, it is understood that all elements x which are not mentioned 
stay fixed. Thus x > xifx 4 3orx # 6. 
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sum 13 (over the table) is just as likely as the sum 22 (under the table). In 
general, of course, any sum 7 has the same probability as the sum 35 — n. 

The transformation x — 7—x also converts “high” into “low.” Thus, for 
example, when S dice are tossed, the probability that the highest number is 
5 is equal to the probability that the lowest number is 2. Similarly (Exercise 
6, Section 1 of Chapter 1), if 5 dice are tossed and the numbers are arranged 
in increasing order, then the middle number is as likely to be 5 as 2. The 
reason in both cases is symmetry (i.e., Equation 4.40). 


27 Example 


If 3 cards are chosen from a deck (without replacement), what is the prob- 
ability that n cards are black (n= 0, 1, 2, and 3)? (This is Example 2.15, 
reconsidered.) 

We let the probabilities be p,(n = 0, 1, 2, and 3). We note that there is an 
easy symmetry which interchanges black and red. (For example, we have 
the symmetry clubs < diamonds, spades = hearts, keeping the rank un- 
changed.) Under this symmetry, the event “no blacks” corresponds to ‘3 
blacks.” Similarly, “1 black” corresponds to ‘2 blacks.’’ Thus po = ps, 
Pi =p. by symmetry. Since pyp+p,+p.+p3=1, we have 2p)+2p, = 1. 
Thus, once pp is formed, p,, po, p; are determined. Symmetry, however, does 
not give us the value of pp. 


28 Example 


Ten coins are tossed. Show that the probability of 3 or more heads is the 
same as the probability of 7 or fewer heads. 

Here we interchange heads and tails in the sample space A = {H, T}: 
H — T, T — H, and use this to define a symmetry on A!°. The event B = “3 
or more heads’”’ corresponds to the event B’ = ‘3 or more tails” = “7 or 
less heads.”’ Since p(B) = p(B’), we have the result. 

It might be argued that we have a false result if the coin is unfair. But in 
this case the transformation H ~ T, T— H would not be a symmetry, 
because it is required that p(H) = p(H') = p(T). Thus the fairness of the 
coin was expressed by the statement that the above transformation was a 
symmetry. A look at Fig. 2.14 will convince the reader that the above sym- 
metry translates into a geometric symmetry when properly graphed. 


29 Example 


Two different cards are successively chosen from a deck. What is the prob- 
ability that the second card is higher in rank than the first? (Count the ace 
as the highest card.) 

If the cards are (x,y), then the transformation (x, y) > (y,x) is a sym- 
metry. Thus, by symmetry, it is equally likely that the second card is higher 
in rank than the first, as it is for it to be lower. Call this probability p. There 
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is also a possibility that the cards have the same rank. This probability is 2, 
regardless of the first card. Thus the probability that both have the same 
rank is = = 4. Finally, the events considered are mutually exclusive and 


constitute all possibilities. Thus 1=p+p+ 4. Solving for p, we obtain 


— 8&8 
D> 77: 


EXERCISES 


1. If 3 dice are tossed, the probability that a 2 or S is tossed is 32. What is 
the probability that a 4 or 6 is tossed? That a | or 2 is tossed? In each case 
state the symmetries involved. 


2. Five dice are tossed. What is the probability that the sum of the faces 
is 17 or less? 


3. Using the results of Table 2.4, find the probability that, when 3 dice 
are tossed, the number 5 occurs and, in addition, no even number is tossed. 
State your symmetry. 


4. A die is tossed at most 3 times, or until a | or a 6 turns up. If a 1 
turns up, Xaviar wins; if a 6 turns up, Yolanda wins; and if neither number 
occurs, Zelda wins. Using symmetry, show that Xaviar and Yolanda have 
equal chances of winning. Find the various probabilities of winning. 


5. An urn contains 5 red, 5 green, and 7 white marbles. Three marbles 
are chosen at random (without replacement). Using symmetry as far as 
possible, find the probability that more reds are chosen than greens. 


6. Prove: If fis a symmetry of S and if g is a symmetry of T, then the 
transformation h of S x T, defined by A(s, t) = (f(s), g(s)), is a Symmetry. 


7. Three cards are chosen successively (without replacement) from a 
deck. What is the probability that they are chosen in increasing order (count- 
ing ace as high)? 


8. An urn contains the integers 1 through 100. Three integers are suc- 
cessively chosen at random from the urn, without replacement. Find the 
probability they are chosen in increasing order. 


9. In Exercise 8 find the probability if the ordered sample is done with 
replacement. 


10. Generalize Exercises 8 and 9 if r integers are chosen from the inte- 
gers | through n. 


11. In Exercise 10 of Section 2, what can be said, using symmetry con- 
siderations, about the probabilities of ending at XY, Y, or Z? 
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12.4 In a well-shuffled deck, the ace of spades separates the remaining 
cards into two parts, those on top of it and those beneath it. For any integer 
n, prove that the probability that n cards are on top of this ace is equal to 
the probability that n cards are beneath It. 


13. In Exercise 12 the 4 aces divide the deck into 5 sections. Using 
symmetry considerations, prove that for every integer n, the probabilities 
p; = probability that the ith section has n cards (i= 1,2,...,5) are equal. 
State the symmetry you are using. 


4 Exercises 12 and 13 are discrete cases of a ‘‘principle of symmetry” discussed in F. Mosteller, 


Fifty Challenging Problems in Probability, Addison-Wesley Publishing Company, Inc., 
Reading, Mass., 1965, pp. 59-61. 


CHAPTER 5 RANDOM 
VARIABLES 


INTRODUCTION 


Roughly speaking, a random variable is a number whose value is determined 
by the outcome of an experiment. It is a variable because it can be one of 
several numbers; it is random because the actual number depends on a 
probability experiment. Random variables have been part of probability 
theory since its beginnings, because they are invariably part of a gambling 
situation. For example, consider the following game in which 2 dice are 
tossed. The player wins $10 if both dice are 6 and he wins $1.00 if only | of 
the dice is 6. But he loses $1.00 if no 6 appears. Here the amount W of 
winnings in dollars is a random variable. The value of W can be 10, 1, or—1, 
depending on the outcome of the experiment. 

We have already considered many random variables. A few examples are 
the number of heads among 10 tossed coins, the number of spades in a 
poker hand, and the relative frequency of an event 4 if an experiment is 
repeated independently for N times. 

In this chapter we shall systematically investigate random variables and 
many important concepts related to them. It will be necessary to sum quan- 
tities extensively, so we shall start with a section on the > notation. We 
shall then use this notation more systematically and less informally than in 
the preceding sections. 
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1 SIGMA NOTATION 


We have already used the symbol > as a general notation to designate a sum. 
We shall now go into more detail and study some of its properties. 


1 Definition 
Suppose that a and b are integers with a = b. Suppose that for each integer 
n between a and b inclusive, a number x, is given. Then we define 


b 

SS Xn = KXatXagi tt +X, (5.1) 

n=a 
The left-hand side is read “sigma x sub n, n= a to n= b.” The numbers a 
and b are called, respectively, the lower and upper limits of the summation. 

In case b =a, we naturally interpret Equation 5.1 to mean 22_, x, = Xq. 
Briefly, we substitute all values of n from a to b and then add the results. For 
example, 
4 


SS Xn = Xgt+Xg +X 


n=2 
5 

»> Vr = Yot Yi + Yo t+ y3t Yat V5 
2 

S P=0?4+17+2?=5 


(i+2) = (142) + (242) +4 (342) + (442) =34+44+5+6= 18 


Me 


1 


7 
Y 3=3434343=12 
j=4 


In the first two examples the specific values of x, and y, are not known. In 
the next two examples we have the formulas x, = k? and y,; = i+2, respec- 
tively, so the sum can be evaluated. In the last example x; = 3 for all values 
of j. Thus x, + x5 +%_ +x, = 3+34+3+43. 

In a sum such as 2°_, x,, the variable n is called a dummy index. The only 
role it plays is to be given values (from a to b) and replaced by these values. 
Any other letter can be used. Thus 


because each is shorthand for x, +x, +-:-:+.x,;. A symbol such as 2h_, x, IS 
sometimes used to designate x, +---+x,. This is not a good notation as it 
confuses the dummy variable n and the number of terms n. We rather write 
Li, X,,1n which the roles of k and n then become clear. 
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2 Example 
Suppose x,=2, x».=4, x,=5, and x,=9. Compute 2£,%;, (21, x;)?, 
LL, x7, and XL, ix;. 

This is pure arithmetic: 


4 
SX HX tx +x, +x, = 2+44+54+9= 20 
4 2 
(s x = (x, +x, +%x3+x,)? = 20? = 400 
i=1 
4 
» 


xP? = xy +x.2 + x5? + x7 = 44+ 16425481 = 126 
and 
4 
SY) ix; = Xy + 2x, +3x,4+ 4x, = 2+84 154+ 36 = 61 
i=1 


In what follows we shall usually state results summing from 1 to n rather 
than from a to b. In all cases there are analogous results for the limits a and b. 


3 Theorem. Linearity of the Sum 


Dy (a, +b.) = > a+ D dy (5.2) 
k=1 k=1 k=1 
and 
Y Ca,=c ¥ a, if cis a fixed number (5.3) 
k=1 k=1 - 


For the proof, we compute. All summations are understood to have lower 
and upper limits 1 and n, respectively. We have 


Dd (at by) = (ay + by) + (dat be) ++ +++ (An + Bn) 
= (a, ta,t+--++a,)+(b,+b,+:+-+b,) 
=> at yd dy 
To prove Equation 5.3 we have 


SY) Cay = Ca, t+cagt+:+++ca, 
=c(a,;ta,t+-:::ta,)=c d¥ ay 


Remark. Equation 5.2 may be generalized in the case where each summand 
is the sum of 3, or more, terms. For example, 


>> (a, +b,4+¢;) => a,+> b,+> Ck 


This is an immediate consequence of Equation 5.2. 
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4 Theorem 
For any constant c, we have 


y C=Nnc (5.4) 
For the proof we note that the left-hand side is the sum of n terms, each of 
which is c. Thus the sum is equal to nc. 


Remark. The factor of c is always the number of terms. Since there are 
b—a+1terms between a and b, inclusive, we have more generally 


M>- 


c=c(b—a-+tl) (5.5) 


k=a 


The above results put into a compact notation algebraic manipulations 
involving adding the sums, factoring out a constant from a sum, and 
summing like terms. We can also add in the reverse order, using the following 
theorem. 


5 Theorem 
») ay = >> Gn—-k+1 (5.6) 
For a proof, we note that 
> An—K+i = An—141 + An—244 T° * T An—nt1 
= Ant An yt ay 
=aA,+a,+::'t+ta,= > a 


k=1 


The following theorem is useful for computing many sums. Some applica- 
tions are given in the exercises. 


6 Theorem. On Summing Differences 


(Ay — Ay) = An— Ao (5.7) 


Ms 


co 
il 
rary 


For the proof, we write 
»y (Ay — Ap—1) = (A,— Ay) + (a, — ay) + (Ag — ay) +e °° + (Ay — An-1) 


In this sum all terms except a) and a, cancel. Thus we have the result. 
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EXERCISES 


1. Let x, be given by the following table: 


Compute 
5 4 
a >d xX b. dS 10x, 
n=0 n=1 
5 5 
Cc. > (Xn — Xn-1) d. »> (x,—1)? 
n=1 n=0 
5) x. 3 n 
. — f. ( x] 
2 n 2, , " 


2. Express each of the following sums compactly by using the summation 
sign. 
@ XgtXgtXs t+ + +X, D. Xp txXytXgt +X 


cc. $+3+it- +i d. i—d4+4—14---4h 
e. 1+2x+3x7+40+---+nx"") 


3. Show that each of the following statements are true. 


n+1 


(> we) +41 = Py Xk 


a 


7 


Y xet ») Xp = > x 


k=n+1 


n n 
C. >) Xap t DY Xp = => Xk 


k=1 k=1 
n J n 
da. > (s n) = Si (n—k+1) x, 
j=1 \k=1 k=1 
4. Ifx,,...,x, are m numbers, express their average using the > notation. 


5. Express 27, (x;— a)? in terms of 2/_, x? and 2%, x; by expanding and 
using the theorems proved in the text. 


6. Express 2?_, (x;— y;)? in terms of 3 sums, as in Exercise 5. 
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7. Using Theorem 6 prove that 2?_, [k?— (k—1)?] =n?. Simplify the 
summandto prove 2?_, (2k—1) =n?. Hence provel +2+:::+n=2W_fpk= 
n(n+1)/2. 


8. In analogy with Exercise 7, show that 2%_, [ke— (k—1)?] = n°. By 
simplifying and using the result proved in Exercise 7, show that 


n(n+1)(2n+1) 


P4+2+---+h=S R= é 


k=1 


9. In analogy with Exercises 7 and 8, and using the results of those 
exercises, show that 

n n2 2 

13+ 23+. — 2m ="inctr 


10. Derive a formula for the sum 


>> (Xn+1 — Xx) 
11. Prove: 
b b-at+1 
»> x > Xk+a-1 
k=a k=1 


2 RANDOM VARIABLES AND THEIR DISTRIBUTIONS 


The notion of a random variable was alluded to in the introduction to this 
chapter. We now give its formal definition. 


7 Definition 
Let S be a probability space. A random variable X is a real-valued function 
defined on S. Thus, given any sample point s of S, there is a uniquely deter- 
mined value X(s). 

In theory, for finite sample spaces §, a random variable X(s) can be quite 
arbitrarily given by a table of values. Thus Table 5.1 gives a sample space § 
together with a random variable on S. 


5.1 A Random Variable 
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However, random variables are usually defined in more natural ways, and 
the values are then found directly from the definition. 


8 Example 
Three coins are tossed. Construct the table for the number of heads that 
turn up. 

We let X = the number of heads. Clearly, the value of X depends upon the 
particular occurrence s. Thus X = X(s) is a random variable. Its values are 
given in Table 5.2. As in many practical problems, however, the choice of 
a sample space is somewhat arbitrary. It would also be natural to choose $ 
and X as in Table 5.3. Note that the random variables of Tables 5.2 and 5.3 
are different, because they operate on different sample spaces. 


5.2 X(s) = Number of Heads 
S HHH|HHT|HTH!|} HTT|THH|THT!}TTH | TTT 


It is possible to introduce the idea of a distribution of a random variable 
from which we may consider certain aspects of random variables without 
any reference to sample spaces. It then turns out that the distribution for 
the random variables of Tables 5.2 and 5.3 are identical. 


9 Definition 


Let X(s) be a random variable and let x be one of the possible values of 
X(s). Let A(x) be the set of sample points s such that ¥(s) = x. Then the 
function p(x) = p(A(x)) is called the distribution of X. The variable x is 
assumed to range over all possible values of X(s). 

The number p(x) is simply the probability that X(s) = x. We sometimes 
write p(x) = p(X =x). In practice, we simply consider the numbers x in 
some order x,,...,x, and give the distribution p(x) in tabular form. It is 
important, however, that the x’s include all the possible values of X(s). 
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10 Example 


Find the distribution of the random variable X of Table 5.2. 

The values of X(s) are seen to be 0, 1, 2, and 3. To find p(1), for example, 
we find A(1)—the set of s for which X(s) =1. By observation, A(1) = 
{HTT, THT, TTH}. Thus p(1) = p(A(1)) =3#. The other values for p(x) 
are similarly found as in Table 5.4. It is seen that Table 5.4 also gives the 
distribution for the random variable of Table 5.3. 


5.4 Distribution of the Number 
of Heads Among 3 Tossed Coins 


x | OF 14] 24 3 


p(x) | ele] eie8 


11 Theorem 


Let p(x) be a distribution of a random variable, where x = x,,...,x, are 
all the possibilities for x. Then 


p(x;) =1 (5.8) 


0< p(x) = 1 (i=1,...,n) (5.9) 


To prove this theorem, suppose that X is the random variable and set 
A(x;) =the set of s such that X(s) = x;. Then the events A (x;) are clearly 
mutually exclusive, and their union is S. (This latter statement is the meaning 
of the hypothesis that the x; are all the possibilities for x.) Thus 2*_, p(A(x;)) = 
1, by Equation 3.7. Since p(A(x;)) = p(x), we have the result. 

Equations 5.8 and 5.9 show that a distribution p(x) may be regarded as a 
probability space on the set x,,...,x, of real numbers. This is an important 
step, because it brings us out of the apparatus of set theory into ordinary 
algebra. The letters x,,...,x, do not represent outcomes or elements of a 
sample space. They represent numbers! It is because of this that we are able 
to draw meaningful graphs of distribution and to construct tables of certain 
types of distributions. We shall consider some special distributions in 
Chapter 7. 

Equation 5.8 is often abbreviated = p(x;) = 1 or even 2 p(x) = 1. In all 
cases, when no upper and lower limits of summation are given, it is under- 
stood that the sum extends over all possible ranges of the variable. 

We can obtain a good intuition about the distribution p(x) of a random 
variable by graphing the equation y = p(x). When the x’s are integers, it is 
customary to plot a bar graph, in which the value y = p(x;) is plotted for all 
x between x,—% and x,;+%. This gives the illusion of a more substantial 
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graph than would be obtained if the n points (x;, p(x;)) are plotted. Figure 
5.5 illustrates 2 types of graphical representation for the distribution of 
Table 5.4. The scale along the y axis has been magnified for clarity. Note 
that the height over a number x is the probability that the random variable 


p(x) p(x) 


5.5 Graphs of a Distribution 


attains the value x, so it is easy to see graphically where the more likely 
values of x are: These are the x values on which the heights are large. Bar 
graphs of distributions are often called histograms. The reader should refer 
to Fig. 2.14 for the graph of another distribution. 


12 Example 


A player tosses 2 dice. If two 6’s occur, he wins $10.00. If only one 6 occurs, 
he wins $1.00. If no 6 occurs, he loses $1.00. Let W = the winnings in 
dollars. Define the random variable W and give its distribution and its graph. 

The natural probability space is A?, where A = {1,2,..., 6}, the points on 
a die. (A is uniform.) The random variable W is defined by the formulas 


W(x, y) = 10 ifx = 6andy = 6 
Wi(x,y) =1 ifx = 6andy # 6 
Wi(x,y)=1 ifx ~ 6andy=6 
W (x,y) =—1 ifx ~ 6andy #6 


The distribution p(x) of W is defined for x = 10, 1, and —1. Its values are 
the probabilities of winning $10, $1, and —$1, respectively. Thus we have 
the following table for p(x): 


x —1 | 1 {10 
10 


D(x) | 3% | ae] x 


Equivalently, we may write p(—1) = 38, p(1) = 38, p(10) = x. The graph 
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(a histogram) is given in Fig. 5.6. The graph well illustrates a familiar aspect 
of gambling: a high yield with a low probability. 


p(w) 


53.6 Graph of Distribution of Winnings 


13. Example 


Find the distributions of the random variables X(s) and Y(s) given by the 
following table: 


Here the possible values of x are x= 1, 4, or 5 in each case. If we let 
P(x) be the distribution of X, and p,(x) the distribution of Y, we can com- 
pute p;(x) for each value of x. For x = 1, p,(1) = [probability that X(s) = 
1] = p(f) +p(g) = .3+.1=.4. Similarly, p,(1) = [probability that Y(s) = 
1] = p(a) +p(b)+p(e) = .1+.1+.2=.4. In the same way we find that 
P,(4) = p.(4) = .2, and p,(5) = p.(5) = .4. Thus the random variables X 
and Y have identical distributions, although X and Y are quite different 
random variables. The common distribution p(x) is given by the table 
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14 Example 
Five cards are drawn, without replacement, from a deck. Find and graph the 
distribution for the number of black cards. 

Here x = 0, 1,..., 5 are possible. A counting argument gives 


p(0) = p(s) = (¢) /(S) = 025) 
p(1) = p(4) = 26: (2 )/(s)e 150) 


p(2) =n3) =(9)(4)/(5) = 325) 


A bar graph of the distribution is given in Table 5.7. Note that the sym- 
metry (interchanging red and black) shows up in the symmetry of the graph 
about the point x = 2.5. 


5.7 Number of Black Cards 


15 Example 
A die is tossed 13 times, or until a 6 turns up. Find and graph the distribution 
for the number of tosses. 

Here x= 1,2,...,13,and 


p(x) = (@)77@) = x=1,...,12 
p(13) = (8)” 


These values are tabulated in Table 1.6. The graph is given in Fig. 5.8. 
If the experiment is to continue indefinitely until a 6 occurs, the distribution 
would be defined for all positive integers x, and we would be out of the realm 
of finite sample spaces. We may regard the height at x = 13 in Fig. 5.8 as 
the infinite sum of the heights at 13, 14,...in the unrestricted experiment. 
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0 l 2 3 4 5 6 7 8 9 10 11 #12 = «213 
5.8 Number of Tosses Before a6 Turns Up 


EXERCISES 


1. Let X be the random variable defined by the following table: 


Find the distribution of X(s) and draw a bar graph of it. 


2. If S is a probability space, is the probability p(s) a random variable? 
If so, find the distribution of p(s) if S is a uniform space and if S is the 5-point 
space of Exercise 1. 


3. Two dice are tossed. Show that the sum S of the numbers tossed is 
a random variable, by giving an explicit formula for §. (Use the usual 36- 
point sample space.) Find the distribution of S and graph it. 


4. Asin Exercise 3, find the largest value M of the 2 numbers tossed, its 
distribution, and its graph. [Use the symbol max (x, y) to denote the maxi- 
mum of x and y.] 


5. Find the distribution for the number of heads among 5 tossed coins. 
Draw a graph of this distribution. 


6. On a certain exam, the grades of the students were distributed as 
follows: Arthur 100 percent, Bernard 85, Carol 90, Doris 85, Eleanor 90, 
Frank 100, George 100, and Helene 100. Explain how the grade may be re- 
garded as arandom variable and find its distribution. 
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7. In playing the game of roulette, a player will win $35.00 with prob- 
7 


ability 4, but he will lose $1 with probability %. Define and graph the 
distribution of his winnings W. 


8. A person tosses a coin 6 times and computes the relative frequency f 
of heads. Explain why fis a random variable, and graph the distribution of f. 


9. An 8-card deck consists of the 4 aces and the 4 kings. It is well 
shuffled. Compute and graph the distribution for the number of 1 of con- 
secutive aces at the top of the deck. (n= 0, 1,..., 4). 


10. In Exercise 9 explain why the distribution for n is the same as the dis- 
tribution for m = the number of consecutive red cards at the top of the deck. 
Does this mean that the random variables m and n are equal? Explain. 


3 EXPECTATION 


If X(s) is arandom variable, it is sometimes convenient to find one number 
which is thought of as typical of the various numbers X(s). The most 
commonly used, and the most useful for theoretical purposes, is the expecta- 
tion of X, also called expected value of X, the mean value or mean of X, or 
the average value of X. It is directly related to the usual average of n 
numbers. 


16 Example 


Referring to Table 1.1, what was the average high number during the first 
100 trials ? (The results are summarized below.) 


Experimental Results: The High Number of 3 Dice 
Highnumberx| 1 | 2/3] 4] S| 6 


Frequency n 1} 6) 5 | 12 | 33 | 43 


Although the high numbers were 1, 2, 3, 4, 5, and 6, it would be wrong to 
add these numbers and divide by 6. It is necessary also to consider the 
frequency of these numbers and to count each number as many times as it 
occurred. (Similarly, a student whose grades during a term were 100, 
100, 100, and 50 does not add 100 and SO and divide by 2 to obtain his 
average.) The average high number x can be computed as follows: 


gal bro: 24+5-34 12-4433 -5+43 +6 
1+6+5+124+33+4+ 43 


= 4,99 
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We have used the formula 


6 
>> NX; 
i=1 


x= 


6 
>> Nn; 
i=1 


Here the numerator is the sum of all of the values x;, each counted the proper 
number of times, while the denominator is the number of experiments. Each 
x; occurs n; times, so it contributes n,x; to the sum. (The aforementioned 
student would find his average similarly: x= (3- 100+1-50)/(34+1) = 
350 = 87.5.) The work in computing x may be done systematically using the 
format of Table 5.9, which is very convenient for use with a desk calculator. 


2x 499 
Sn ~ 100 


= 4.99 


=I 


This process is generalized as follows: 


17 Definition 


Suppose an experiment has outcomes given by a finite sample space S = 
{5,,...,5,} and that X(s) isa random variable. Suppose that the experiment 
is performed N times and that the frequency of the outcome s;, is n; (i.e., 5; 
occurred n; times). Then the average value of X for this sequence of experi- 
ments is defined by the formula 


> X(5;) 1 ¥ (s,) 
xX = ——— =— BV 7X(5;) (5.10) 
> Nn; nz 
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We can write Equation 5.10 using relative frequencies f,. We recall 
(Definition 2 of Chapter 1) that f, = n,/N. Thus using Equation 5.10 we have 


= ») LX (S;) (5.11) 


In this formula there is no division by N. So to speak, the division has 
already been done in computing the relative frequencies f;, We may illus- 
trate Equation 5.11 taking the data of Example 16. The computation appears 
in Table 5.10. 


Equation 5.11 immediately suggests the definition of the average value 
of a random variable. This definition will be given only in terms of prob- 
abilities and will not involve experimental results. It is motivated by our 
original intention that, for a large number of experiments, f, ~ p; = p(s;). 
According to common usage, we use the term expectation of X, or expected 
value of X, rather than the equally valid terms mean of X, or average value 
of X. We also use the symbol E(X) or E(X(s)) to denote the expected value 
of X. 
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18 Definition 


Let X = X(s) be arandom variable on a finite sample space S = {s,,... 


The expectation of X, E(X), is defined by the formula 


k 
E(X) = x P(s;)X (s;) 
This formula may also be written 


E(X) = & p(s)X(s) 


E(X) => piX (si) 


or 


, Spf. 


(5.12) 


(5.13) 


(5.14) 


By comparing Equation 5.12 with 5.11 and recalling the notion of prob- 
ability as a limiting relative frequency, we may give the following experi- 
mental significance of E(X): If a large number of experiments is performed, 
then the average of the experimental values of X(s) will very likely be very 


close to E(X). 


The probabilities for the outcomes in Example 16 were computed in 
Example 6 of Chapter 2. We may thus use these results and Equation 5.13 to 
compute E(X), where X is the high number of 3 dice. The computation 
appears in Table 5.11. Thus E(X) = 4.96. The previous computation x = 


4.99 is quite close to E(X). 


5.11 


Pp 


| 


14/216 
57/216 
148/216 
305/216 


546/216 


1,071 _ 
WE = 4.96 


1,071/216 E(X) 


Remarks. The average value x of data is an abstract mathematical concept. 
Thus, when we find that the average family has 2.2 children or the average 
annual income in a certain city is $5,400 per year, there is no implication 
that any family has 2.2 children or that anyone earns $5,400 per year. The 
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value x is an attempt to reflect complicated data (a distribution of values) 
with a single number. As such, it must be used with caution. For example, 
it may be reassuring to learn that the average income of a family in the 
United States is well above the poverty level, but this in no way replaces 
the more meaningful distribution of income. If 99 people in a town earn 
$1,000 per year, while 1 person earns $101,000 per year, the average in- 
come in dollars for these 100 people is ($99 - 1,000+ 1 - 101,000)/(99+ 1) = 
2,000. On the other hand, if all 100 people earned $2,000, the average 
income is again $2,000. We cannot expect one figure (the average) to replace 
the actual distribution. 

Similarly, the number E(X) is also an abstract mathematical concept, 
which is often even further removed from reality, because it is based on 
probabilities that must be either assumed or approximated. Nevertheless, 
we shall find ample use for this notion in the sequel. 

We can use Equation 5.11 to compute experimental average values. All 
that we do is replace the probability p; by the relative frequency f;. This 
was justified in Section 5 of Chapter 1. However, if data are given with 
frequencies n,, it is often useful to use Equation 5.10 rather than to compute 
relative frequencies. 

The data of Example 16 were given in terms of the distribution of the 
random value X(s). For example, even if the underlying sample space had 
6? = 216 sample points, there were only 6 possible values of X(s), so the 
use of the distribution of X(s) simplified the problem. The following theorem 
shows that we may compute E(X ) using only the distribution of X. 


19 Theorem 
If p(x) (x = x1,...,X,) iS the distribution of the random variable X, then 


E(X) = xp(x) = ¥ mpl) (5.15) 


To prove this, we write out the expression for E(X ) and collect all terms 
with the same X(s). Thus, recalling Definition 9, 


E(X) = % p(s)X(s) 


sES 
= DY pls)X(s)+ DY pls)X(s)+---+ DY pls)X(s) 
X(s)=21 X(s)= £2 X(9)= In 
= XY Pls)yte-+ DY pls)xn 
X(s)=21 X(s)=2n 
=x, DY p(s)t:::+x, DY pls) 
X(9)=21 X(s)= Ip 


=X, p(X1) +++ + +x,p (Xn) = > X;iP(X;) 


We illustrate this theorem with a simple example. 
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20 Example 


Three coins are tossed. Find the expected number of heads. 
The computation using the definition is as follows. X(s) is the number of 
heads in the sample point s: 


ry HHH | HHT | HTH | HTT | THH | THT | TTH | TTT 


x xp(x) = F = 15 


If, for example, 10 coins were involved, the definition of E(X ) would give 
a sum with 1,024 terms, but Theorem 19 reduces this to 11 terms. This is 
the power of “collecting like terms.” 


21 Example 
In Example 12, what are the expected winnings? 


The distribution on page 161 is repeated here, and the computation is 
given in the table 


x | p(n) | xp(x) 
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We have E(W) = —3 = —.139. Thus the player of that game loses, on the 
average, about $.14 per game. 

What is the significance of this figure? For the gambling house that offers 
this game, the figure —$.139 is very meaningful. For they are concerned 
with a large number of “experiments.” They know that when 10,000 games 
are played, they will, in all likelihood, be $1,390 ahead on this game (give 
or take a few hundred dollars). On the other hand, a person might decide to 
play this game only once or twice, so this long-range aspect will not appear 
so significant to him. Yet it seems reasonable to take the expected winnings 
as a fair measure of the value of this game to him. (This measure does not 
take into consideration the “value” of the gambling experience.) There are 
various theories that measure the value of a gambling situation to a person, 
but the most commonly accepted one, and the one we use, is to use the 
expected winnings as the measure of the value of the game. 


22 Example 


A die is tossed and the player wins in cents, the number appearing on the 
die. If a 6 occurs, then the player wins 6 cents and 1s also entitled to one 
other throw, winning the second number also. It costs 4 cents to play the 
game. Is it worth it? 


The above tabulation shows that the expected value of the winnings W is 3. 
The game is favorable. Here, x is the amount won, and w = x— 4 is the net 
winnings, after paying the entrance fee. 

Although we shall not use the concept in what follows, it is worth mention- 
ing another important number, called the median, which is associated with 
a random variable or its distribution. Roughly speaking, the median is the 
value Xmeq SuCh that it is just as likely that the value X(s) is above it as 
below it. In terms of frequencies, half the results will be below the median 
and half above it. We illustrate with 2 examples and then leave the formal 
definition to the reader. 
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23 Example 
Find the median value of x for the distribution of the following table: 


To do this we find the cumulative probability P(x) = probability that X(s) < 
x. [For example, P(3) = p(1)+ p(2)+ p(3) in this case.] The computation Is 


The median Xmeqg = 4. This is where the cumulative distribution first ex- 
ceeded .5. 


24 Example 
Find the median value of x for the distribution of the following table: 


In this distribution the cumulative distribution is exactly .SO at x= 1.8. 
This means that X(s) < 1.8 with probability .5 but also that X(s) = 2.1 with 
probability .5. In this case it is customary to split the difference and to take 
the average ($)(1.8+2.1) = 1.95 as the median. Note, however, that if we 
had p(1.8) = .06 and p(2.1) = .14, we would have taken x = 1.8 as the median. 

As these examples show, the median is completely insensitive to the 
extreme values of x. If, in Example 23, we changed x = 6 to 6,000 and x = 1 
to x = —1,000,000, the median would have remained 4. It is this insensitivity 
to extremes that often dictates the use of the median. On page 169 (Remarks) 
we gave the example of 99 people earning $1,000 and | earning $101,000. 
The average was $2,000, and it was that high because of the one atypical 
income. The median is, of course, $1,000. 
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On the other hand, a gambling house, or an insurance company, is quite 
sensitive to extremes. Thus, if an insurance company occasionally will have 
to pay off a large sum, it makes quite a difference to that company whether 
it is $1,000,000 or $100,000,000. Therefore, we may be certain that the 
insurance company will use the expected value of their payoffs, rather than 
the median, when they compute their rates. 


EXERCISES 


1. The daily maximum temperature in April 1966 in New York City is 
given in the table below. Compute the average maximum temperature for 
April. Also find the median maximum temperature. 


Date 1} 2; 3} 4) 5} 6; 7] 8} 9710) 11712) 13) 14] 15 


Max. temp. | 49 | 49 | 56 | 53 | 54 | 55 | 57 | 52 | 53 | 55 | 57 | 59 | 44 | 60 | 66 
(°F) 


Date 16 | 17 | 18 | 19 | 20; 21 | 22 | 23} 24) 25 | 26 | 27 28 | 29 30 


Max. temp. | 62 | 63 | 66 | 57 | 47 | 69 | 64 | 68 | 55 | 73 | 73 | 52 | 42 | 60 | 55 
(CF) 


2. Five coins are tossed. What is the average number of heads that 
turn up? 


3. A die is tossed and the number x turns up. What is the average value 
of x? What is the average value of x”? 


4. Verify Theorem 19 for the random variable X of Table 5.12. 
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5. Verify Theorem 19 for the random variable Y of Table 5.13. 


6. Is it possible for the median of a random variable X to equal the mean 
of X? Give an example. 


7. Two (different) cards are chosen from standard deck of cards. What 
is the average number of aces that are chosen? What is the average number 
of diamonds that are chosen? 


8. Two dice are tossed. What is the expected value of the sum of the 
numbers that turn up? 


9. Two dice are tossed. What is the expected high number that turns 
up? What is the expected low number’? 


10. Two dice are tossed. What is the expected difference between the 
high and low number? 


11. Five coins are tossed. On the average, how many heads turn up? 
12. Prove: If a = X(s) < b for all sample points s, thena < E(X) S b. 
13. Prove: If X(s) = c(aconstant) for all s, then E(X ) = c. 


14. An experiment has probability p of succeeding. It is repeated (inde- 
pendently) 3 times. What is the expected number of successes? 


15. A whole number is chosen at random between 1 and 10 inclusive. On 
the average, how many divisors has the number? (For example, the number 
10 has 4 divisors: 1,2, 5, and 10.) 


16. A ball is placed into one of 4 cups (numbered 1, 2, 3, and 4). This is 
repeated, independently, with 2 other balls. Let X be the largest number of 
balls occurring ina cup. (Thus X = 1, 2, or 3.) Find E(X ). 


17. In Exercise 16 let Y be the number of the first cup with at least 1 ball 
init. (Thus Y = 3 means that cups 1 and 2 are empty but cup 3 has something 
init.) Find E(Y). 
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4 ALGEBRA OF EXPECTATIONS 


There are a few theorems concerning the expectation E(X) which simplify 
computations and which are of great theoretical importance. Before pro- 
ceeding with these results, it is necessary to define the sum and product of 
random variables. 


25 Definition 


Let X = X(s) and Y= Y(s) be random variables defined on a probability 
space §. The sum X+Y and the product XY are random variables on S 
defined by the formulas 


(X+Y)(s) =X(s)+Y(s) (5.16) 
(XY)s = X(s)Y(s) (5.17) 

If cis any constant, we define the product cX by the formula! 
(cX)(s) =c-X(s) (5.18) 


26 Example 


Let X(s), Y(s) be random variables as in Table 5.14. Find the random 
variables X + Y, XY, and 10Y. Find the mean of X, Y, X+ Y, XY, and 10Y. 


Equation 5.16 shows that the values of X + Y can be found by simply adding 
the values of X to the corresponding values of Y. XY and 10Y can be found 
similarly. The values of X+Y, XY, and 10Y are given in Table 5.15. The 
means are computed in the table. We note that E(X) + E(Y) = 1.94+1.6= 
3.5. Also, E(X+Y) =3.5. Thus E(X+Y)=E(X)+E(Y). Similarly, 


] Any constant c may be regarded as a random variable whose only value is c. With this under- 
standing, Equation 5.18 is a special case of Equation 5.17, because 


(cX)(s) = c(s)X(s) =c- X(s) 
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5.15 


a |b | c|Mean 
X(s) (1)(.3) + (2)(.5) + 3)(.2) = 1.9 = E(X) 
Y(s) ay rfa (3)(.3) + (1)(.5) + (1)(.2) = 1.6 = E(Y) 
(X+Y)(s) (4)(.3) + (3)(.5) + (4)(.2) = 3.5 = E(X+Y) 
(XY)s (3)(.3) + (2)(.5) + (3)(.2) = 2.5 = E(XY) 


10Y 30 | 10 | 101) (30)(.3) + (10)..5) + (10)(.2) = 16 = ECOY) 


E(10Y) = 16, while 10E(Y) = (10)(1.6) = 16. Thus E(10Y) = 10E(Y).? 
We now state and prove these results in general. 


27 Theorem 
Let X and Y be random variables on the probability space S, and let c be a 
constant. Then 


E(X+Y) = E(X)+E(Y) (5.19) 
E(cX) = cE(X) (5.20) 
E(c)=c (5.21) 


Proof. We merely calculate the left-hand side and reduce it to the right- 
hand side. By Equations 5.12 and 5.16 we have 


P(s;) (X+Y)(s;) 


| 
M = 


E(X+Y) 


~ 
ll 
—_ 


p(s;)[X(si) + Y(s,) ] 


I 
M = 


~ 
ll 
—_ 


[p (si) X (s;) + p(si) ¥(s;) ] 


k 


P(S;)X(s;) + ¥ p(si) ¥(s:) 


i=l 


| 
M = 


~. 
Il 
—_ 


l 
M = 


we 
i 


| 
Bo 
PS 
+ 
Ss 


(Y) 


2 Note, however, that ELXY)E(Y) = (1.9)(1.6) = 3.04, while E(XYY) = 2.5. E(XY) = E(X)E(Y) 
not true in general. Theorem 46 gives a condition under which this equation will be true. 
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Similarly, 
E(cX) = > p(s) (cX)(s) 


= > p(s)c- X(s) 
=c >) p(s)X(s) 


= cE(X) 
Finally, 
E(c)= & cp(s) 


sES 


=c % p(s) 
sES 
=c:l=c 
This proves the theorem. 
If a random variable is the sum of more than 2 random variables, we can 
easily extend Equation 5.19. Thus E(X¥+Y+Z)=E(X+Y)+E(Z) = 
E(X)+E(Y)+£E(Z). In general, we have 


28 Example 


The average height, in feet, of the students in a class is 5.5. What is the 
average height, in inches? 


Method. One would suppose that the answer is 5.5 X 12 = 66 (inches), and 
it would appear that there is no problem. The reason for this supposition is 
that our feeling for changing units is ordinarily so ingrained that we take its 
properties for granted. 

In this problem let the (uniform) sample space S be the set of students, 
and let X(s) =the height, in feet, of students. Let Y(s) = the height, in 
inches, of students. Then, because of the usual relationship between feet 
and inches, Y(s) = 12X(s) forall s, or 


Y= 12X 
Therefore, by Equation 5.20, 


E(Y) = E(12X) = 12E(X) 


Since we were given E(X) =5.5, it follows that E(Y) = 12E(X) = 66, as 
noted above. 

This example shows that the formula E(cX) =c:-E(X) may be inter- 
preted as follows: If the values of a random variable X are measured in 
certain units (i.e., feet, dollars, cubic inches, etc.), then the number E(X) 
may be regarded as having that unit. 
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29 Example 


Ten coins are tossed. What is the average number of heads that turn up? 

The sample space may be regarded as consisting of 1,024 sample points. 
For each sample point s, we let N(s) = number of heads of s. Thus, if 
s = HHTHTTTHTH, Ns) = 5S. We wish to compute E(N). We let 


N,(s) = {¢ if s starts with H 
1 0 if s starts with T 


Briefly, N,(s) = number of heads on the first toss. Similarly, we let No(s) = 
the number of heads on the second toss, and in the same way, N3, Ng,..., 
N,o are defined. In each case, N; = 1 or 0. [For the above sample point 
s = HHTHTTTHTH, JN, (s) = 1, Ns5(s) =0, Nio(s) =1.] The following 
remarks permit the calculation of E(N): 

(a) N(s) = Ni(s) +N2(s) ++ +++ Nyo(s) = Zier Ni(S). 

(b) E(N;) = 2. 

To prove (a) note that the sum 2j®, N;(s) is a sum of 1’s and 0’s. Further- 
more, each 1 occurs when, and only when, a head appears. Thus 3,_, N,(s) = 
number of heads ins = N(s). 

To prove (b) we can calculate the distribution p;(x) of N;. Since N;(s) = 1 
if, and only if, s has a head on the ith occurrence, we see that the probability 
that N; = 1 is 3. Similarly, N; = 0 with probability 3. Thus the distribution 
p;(x) of N; has the table 


Thus E(N,) = = xp;(x) =0:3+1-%4=3. This proves (b). Finally we have, 
using (a) and (b), 


This example used an interesting idea. The number of heads was regarded 
as a sum of 0’s and 1’s, with summand 1 for each head. This idea will be 
used again in what follows. 


30 Example 
Five (different) cards are chosen at random from a standard deck. On the 
average, how many hearts are chosen? 


Method. Let N= WN(s) be the number of hearts in the sample point s. As 
in Example 27, let N,(s) = 0 if the first card is not a heart, and N,(s) = 1 if 
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the first card is a heart. If N.,..., Ns are similarly defined, we have 
N=N,+°:::+N; 
E(N) = E(N,) +--+: +E(Ns) 


But E(N,) =0:#+1-7=4, because N, = 1 with probability 7, and N, = 0 
with probability 2. Similarly, E(N;) = 4. Thus 


and an average of 14 hearts are drawn. Note that a direct computation of 
E(N), using Definition 18 or Theorem 19, is cumbersome. 


EXERCISES 


1. A box contains 50 black balls and 50 white balls. Thirty balls are 
selected (without replacement). What is the average number of black balls 
chosen? 


2. In Exercise 1 suppose the 30 balls were chosen with replacement. 
What would the average number of chosen black balls be? 


3a. A die is tossed. Let N be the number that turns up. Find E(NV). 
b. Suppose 2 dice are tossed. Let § be the sum of the numbers turning 
up. Find E(S). (Hint: Let N, =the number on the first die and N, = 
the number on the second die. Then S = N,+N,.) 
c. Suppose 10 dice are tossed. Let § =the sum of all the numbers. 
Find E(S). 


4. Each time a certain computing machine adds 2 numbers, its prob- 
ability of making an error is .0001. During the day this machine performs 
100,000 additions. What is the expected number of errors? 


5. Ten dice are tossed. What is the expected number of dice that turns 
up numbering 3 or higher? 


6. One hundred cards are numbered from | through 100. Two of these 
cards are chosen (without replacement). What is the expected value of the 
sum? 


7. In Exercise 6 suppose that x is the expected value of the high card 
chosen and y is the expected value of the low card. Evaluate x+y. (Hint: 
It is not necessary to evaluate x or y. This is a more difficult problem.) 


8. Let A be a subset of a sample space S. Let X(s) = 1 for s € A and 
let X(s) = Ofors € A. Prove E(X) = p(A). 
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9. Suppose E(X)=1, E(X?)=2. Find E(X—1), E(X—1)?, and 
E(2X + 1). 


10. Suppose E(X ) = m. Prove that E(X — m)? = E(X?) — m’. 


11. Ten balls, numbered 1 through 10 are placed at random into 10 boxes 
similarly numbered. A match occurs if a ball is in a box with the same num- 
ber. What is the expected number of matches? If the balls are placed at 
random into different boxes, what is the expected number of matches? 
(Hint: Let M, = 1 if ball 1 is in box 1, M, =0 otherwise. Similarly define 
M,,..., Mio.) 


5 CONDITIONAL EXPECTATIONS 


Conditional expectations play a role in the calculation of expectations very 
similar to the role of conditional probabilities in the calculation of prob- 
abilities. We now define this notion and give some of its applications. 


31 Definition 
Let X be a random variable on the probability space S$, and let A be an 


event of S with p(A) > 0. The conditional expectation of X, given A 
[written E(X|A)] is defined by the formula 


E(X|A) = y p(s|A)X(s) (5.22) 


Thus the value E(X|A) may be regarded as “the expected value of X on 
the assumption that A has occurred.” In order to relate E(X|A) with an 
experimental situation, imagine running an experiment several times and 
computing the average value x of X(s) only for those s in A. The relative 
frequency of an occurrence s (in A) will be f(s|A). (See Definition 14 of 
Chapter 2.) By Equation 5.11 the resulting average x will be given by 


x= 2 f(s|A)X(s) 


Finally, if a large number of experiments are run, f(s|A) will be near 
p(s|A), so x will be near E(X|A). Thus: Ifa large number of experiments is 
performed and only the values s € A are counted, then the average of the 
experimental values of X(s) will very likely be very close to E(X|A). 

Conditional expectations may be put to use with the help of the following 
theorem, similar to the product rule for probabilities (Theorem 22 of 
Chapter 2) and the method of tree diagrams. 


32 Theorem 


Let A,,...,A, partition a probability space §, and assume that p(A;) > 0. 
Let X be arandom variable on S. Then 
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E(X) = p(A,)E(X|A,) ++ ++ +p(A,)E(X|A,) 
= D P(A) E(X |A;) (5.23) 


Proof. We compute the right-hand side: For each i, 
p(A:)E(X|A;) = p(A) yi p(s|A;)X(s) 
SEA; 


=2 p(A;)p(s|A;)X(s) 


= 2 p(s)X(s) 


SEA, 


because p(A;)p(s|A;) = p(s), by the multiplication rule. Thus 
> P(A )E(X|A;) = ¥ p(s)X(s)+---+ ¥ p(s)X(s) 
SEA, 


SEA, 


=» p(s)X(s) = E(X) 


This completes the proof. 

Theorem 32 is useful when an experiment occurs in stages and tree 
diagrams can be used. The events A; are thought of as the first stage. In 
Fig. 5.16 a random variable X is given whose expected value is 3, provided 
A, has occurred, but whose expected value is 7 when A, has occurred. A, 
and A, occur with probabilities .9 and .1, respectively. (They are assumed 
mutually exclusive.) The expected value of X is computed to be 3.4. 


9 A,——E(X|A,) = 3 (.9) (3) = 2.7 


‘| A,——E(X|A) =7 (.1) (7) — et 


7 
E(X) =3.4 
5.16 Tree Diagram 


33 Example 


The average height of the boys in a class is 68 inches. The average height 
of the girls is 61 inches. Seventy-five percent of the class is male. What is 
the average height of a student in the class? 

In the problem we regard relative frequencies as probabilities, invoking 
Section 5 of Chapter 1.23 Then H = H(s) = the height in inches of sample 
point s is a random variable on the sample space S of students. Let B = the 


3 For example, we can make the class into a uniform sample space. The set B of boys will 
then have probability p(B) = .75. 
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set of boys, G = the set of girls. We have 


1. BandG partition S. 
2. p(B) = 4, p(G) =3. 
3. E(A|B) = 68; E(A|G) = 61. 


Thus, by Equation 5.22, 
E(H) = p(B)E(A|B) +p(G)E(A|B) = (4) (68) + (4) (61) = 66.25 


The computation is indicated in Fig. 5.17. 
, ,B—E(H|B)=68 — (3)(68) 


* G—E(A|G)=61  (4)(61) 


E(H) = (3) (68) + (4) (61) = 663 
5.17 Tree Diagram for Heights of Students 


34 Example 


A coin is tossed 5 times, or until a head turns up — whichever happens first. 
On the average, how many times ts the coin tossed? 


Method 1. A direct computation is given in Fig. 5.18. The random variable 
N is the number of tosses for a given sample point s. Thus N(TTH) = 3. 
This computation shows that E(N) = 1.9375. 


on 1) p(1)N = (4)(1) =4 
Ts H(N=2) p(2)N = (4)(2) =4 
Veo H(N =3) p(3)N = (8) (3) =2 
\i H(N =4) p(4)N = (as) (4) =4 
Vi wes p(5)N = (#8) (5) = # 


E(N) = piNi=3+4+8+4+ t= He 
5.18 Tree Diagram for Coin Tossing 
Method 2. Conditional expectations appear to be useful here, because the 


two cases (first toss H, or first toss T) can each be computed separately. To 
do this we must first consider a 4-toss game (in which at most 4 tosses are 


RANDOM VARIABLES 183 


allowed). We then regard the given game in 2 stages: If the first toss is T, we 
play the 4-toss game, but if the first toss is H, we stop. We let N, = N,(s) be 
the number of tosses for 4-point games and take N = N;. We then have 
(where s is any sample point in the 4-toss game) 


N;(H,s) = 1 


N;(T,s) =1+N,(s) 
Thus 
E(N;|H) = E(N;(H,s)) = 1 
and 
E(N5|T) = E(N5(T, s)) = E(A+N,(s)) = 1+ E(N4) 


This is indicated in Fig. 5.19. 
4 H—E(N;|H) = 1 


5.19 Another Tree Diagram for Coin Tossing 


Thus, using Equation 5.23, we have 
E(Ns) = (2) (1) + @) [1+ E(N4)] 
= 1+ (2) E(N4) (3.24) 


Equation 5.24 evaluates E(N;) in terms of E(N,). But in the same way, we 
can obtain 


E(Nes1) =1+(4)E(Ng) (k= 1,2,3,4) (5.25) 
Thus 
E(N,) = 1+ (@)E(N,) =3 
E(N3) = 1+ (2)E(N2) = 14+ (4) (%) =F 
E(N,) =1+ = ¥ 
and, finally, 


E(Ns) = 1+ (2) (@) = t6 = 148 


For this computation it is seen that regardless of how many tosses are 
allowed, the expected number of tosses before a head turns up is less than 2. 
In fact, if we continue in this way, we can show that 


E(N,;) =2—-1/2"7 (k=1,2,...) (5.26) 


It is natural to say that if an unlimited number of tosses were allowed (i.e., 
if there were no artifical stopping point), the expected number of tosses 
would be 2. This a correct statement, but it involves infinite sample spaces, 
as in Section 2 of Chapter 4. 
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35 Example 
A die is tossed until a 6 turns up. What is the expected number N of tosses? 


Method. This problem involves an infinite sample space, but we shall 
nevertheless illustrate the technique of Theorem 32 here. Two cases can 
occur: If the first toss is a 6, then N = 1. [E(N| first toss is a 6) = 1.] If the 
first toss is not a 6, then E(N| first toss is not a 6)=1+E(N). For, if the 
first toss is not a 6, we are left in the same game, except that we have used up 
1 toss. Thus we have the tree diagram of Fig. 5.20. Then, using Theorem 32, 
we have 
E(N) = (6) 0) +(@) [1+ E(N)] 


= 1+ (@)E(N) 


When this equation is solved for E(N) we obtain E(N) = 6. This is the 
required expectation. 


, 6—E(N|6) = | 


+ not6——E(N|not 6) = 1+E(N) 


5.20 Tree Diagram for Dice 


We finally give another direct illustration of Theorem 5.32. 


36 Example 


A whole number X is chosen at random between 1 and 4 inclusive. The 
number Y is then chosen at random between 1 and_X. Find the expected 
value of Y. 


Method. If S is the appropriate sample space, it is clear that X(s) and Y(s) 
are well defined. Further, E(Y|X = 1) = 1. Similarly, E(Y|X = 2) = ()(1)+ 
($)(2) = 3, E(Y|X = 3) = (4) (14243) =2, and E(Y|X = 4) = (4)(14+2+ 
3+4) =3. Thus we have Fig. 5.21, in which E(Y) is computed to be 7. 


= p(I)E = (4) (1) 
(Y|X = 2) =2 p(2)E = (4) () 
3——E(Y|X = 3) =2 p(3)E = (4) (2) 
= 3 p(4)E= (4) @) 

E(Y) = (4) (1) + (4) (8) + G) (2) + @) @) =4 


5.21 Tree Diagram 


RANDOM VARIABLES 185 


EXERCISES 


1. Forty percent of the people in a certain community have an average 
annual income of $7,200. The remaining 60 percent have an average annual 
income of $9,000. What is the average annual income of the people in the 
community? 


2. Two dice are tossed. If the sum is 8 or more, the player wins $1; other- 
wise, he wins nothing. If the sum is 10 or more, he is also entitled to another 
throw. On his second throw he wins, in dollars, the sum thrown on the second 
try. What are his expected winnings? 


3. A die is tossed repeatedly until the sum of all the numbers tossed is 
5 or more. (For example, if 1, 3, 4 are tossed in order, the play would stop 
after these 3 tosses, because | + 3+ 4 is larger than 5.) On the average, how 
many times will the dice be tossed? (Hint: After the first die is tossed, the 
game is reduced to a similar game but with a smaller cutoff point. Thus 
consider the game where the play stops when the sum is 4, 3, 2, and 1.) 


4. A coin is tossed until 2 consecutive heads occur. What is the average 
number of tosses required? (Ignore the fact that this game is potentially 
infinite in length.) 


5. Using the result of Exercise 4, what is the average number of times it is 
required to toss a coin until 3 consecutive heads occur? Generalize, by 
finding the average number of tosses needed to toss k consecutive heads. 


6a. If an integer N is chosen at random between | and n inclusive, prove 
that E(N) = (n+ 1)/2. 
b. Suppose that an integer N, is chosen at random between 1 and n 
inclusive, and then N, is chosen at random between | and N,. Find 
E(N,). 
*c. Keep going! Choose N, at random between | and No. Find E(N3). 
Similarly, find a formula for E(N,,), where N,, is the number chosen 
after A trials. [Your intuition should tell you that E(N,) is almost 1 
when k is large. ] 


6 JOINT DISTRIBUTIONS 


If X is a random variable on a probability space S, we have seen in Section 
2 how many of the properties of X may be summarized by the distribution 
p(x) of X. If X and Y are 2 random variables, both operating on the same 
probability space, we may analogously define the joint distribution of X 
and Y. : 
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37 Definition 


Let X and Y be random variables on a probability space §. Let x be one of 
the values of X(s) and let y be one of the values of Y(s). Let A(x, y) be the 
set of sample points s such that X(s) =x and Y(s) = y. Then the function 
p(x, y) = p(A(x, y)) is called the joint distribution of X and Y. The variables 
x and y are assumed to range over all possible values of X(s) and Y(s) 
respectively. 


Remark. Using the notation of Definition 9, A(x, y) =A(x) NM B(y), 
where A (x) is the set of s with X(s) = x and B(y) the set of s with Y(s) = y. 
p(x, y) is simply the probability that X(s) = x and Y(s) = y. We sometimes 
write p(x, y) =p(X =x and Y=y). The values of p(x, y) can be con- 
veniently recorded in a table, as Example 38 illustrates. 


38 Example 


Two coins are tossed. Let X = 1 if the first coin is head and X = Oif the first 
coin is tails. Let Y= number of heads tossed. Find the joint distribution 
p(x, y) of X and Y. 


Method. The values for x are 0 and 1; the values for y are 0, 1, and 2. Place 
the x values in a column and the y values in a row, as in Table 5.22. Then 
p(x, y) is computed in all cases and entered in the row opposite x and 
column under y. For example, p(0, 1) = probability the first coin is tail, 
and the number of heads (on 2 tosses) is 1. Thus p(0, 1) =4. In the same 
way, each of the 6 entries may be found. 


5.22 Joint Distribution 


Totalg(y) 14 121 4 


Note that the sum of each row gives the distribution p(x) of X, while the 
sum of each column gives the distribution g(y) of Y. This will be proved in 
what follows, but it should be clear. For p(0) = probability (¥ = 0) = 
p(X =0 and Y=0)+p(X = 0 and Y= 1)+p(X = 0 and Y = 2). But this 
is precisely the sum of the first row, and similarly for all rows and columns. 
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39 Theorem 


Let X and Y be random variables on a probability space S, and let p(x, y) 
be their joint distribution. Then 


p(x) = & p(x, y) (5.27) 
is the distribution of X, while 


q(y) => p(x, y) (5.28) 


wi 


is the distribution of Y. 

Equation (5.27) expresses p(x) as the sum of all the entries in the row cor- 
responding to x, while (5.28) expresses g(y) as the sum of all the entries in 
the columns corresponding to y. (See Table 5.22.) The distributions p(x) 
and q(y) obtained in this way from the joint distribution p(x, y) are some- 
times called marginal distributions. 

Proof. Let A(x) = the set of s where X(s) =x, and let B(y) = the set of 


s where Y(s) = y. Then, if y;, yo,..., ¥, are all possible values of y, we have 
A(x) = {A(x) N B(y,)} U {A(x) M Blye)} U+++ U {A(x) A Blyn)} 
(5.29) 


This equation merely states that X(s) = x occurs with one of the possibilities 
Y(s) = y;. But these possibilities are mutually exclusive. Therefore we can 
use Theorem 10 of Chapter 3 to obtain 


p(x) = p(A(x)) = p(x) 9 BG,)) 


(x, yj) 


This proves Equation 5.27. Equation 5.28 may be proved similarly. 


40 Example 


A whole number x is chosen at random between 1 and 4 inclusive. The 
number y is then chosen at random between | and x. Find the expected value 
of y. 


Method. (Cf. Example 36.) We shall find the joint distribution p(x, y), 
then the distribution g(y), and then the expected value. Table 5.23 is com- 
puted separately along columns 1, 2, 3, and 4. For example, to find the third 
row, we must compute p(3, y) = probability first number is 3 and the second 
is y. This is seen to be (4) (3) = 4 for y= 1, 2, 3 but is 0 for y= 4. The 
other rows are dealt with similarly. The distribution qg(y) is obtained by 
summing the columns. Finally, using Theorem 19, E(Y) = Zyq(y) =4= 
1.75. 
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> yq(y) = $= 1.75 = E(Y) 


Under. certain circumstances the random variables X and Y are inde- 
pendent. Intuitively, this means that if the value of X(s) is known, the 
distribution of values for Y(s) are unaffected, and vice versa. For example, 
if 2 dice are tossed and if X(s) = number on the first die while Y(s) = 
number on the second die, then X and Y are intuitively independent. The 
formal definition of independence brings in the notion of independent 
events. 


41 Definition 
Let X and Y be random variables on a sample space S. X and Y are said to 
be independent if, for every possible value x of X and every possible value 
y of Y, the events XY = x and Y = y are independent events. 

Equivalently, let A(x) be the set of s where X(s) =x and let B(y) be 
the set of s where Y(s) = y. Then X and Y are independent provided that 


p(A(x) O Bly)) = p(A(x)) - p(B(y)) (5.30) 


In fact, Equation 5.30 is precisely the condition that A(x) and B(y) be inde- 
pendent events (see Definition 31 and Equation 23 of Chapter 3). 
We may rephrase Equation 5.30 using distributions. 


42 Theorem 


Let X and Y be random variables with joint distributions p(x, y). Let X 
have distribution p(x) and let Y have distribution g(y). Then X and Y are 
independent if and only if 


p(x, y) = p(x)q(y) for all x and y (5.31) 
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Proof. Using Definitions 9 and 37, we have 
p(x,y) =p(A(x) N Bly)) = p(x) =p(A(x)) — gy) = Pp(B(y)) 
Thus Equation 5.30 is equivalent to Equation 5.31. 


43 Corollary 
The joint distribution p(x, y) of X and Y determines whether X and Y are 
independent or not. 

In fact, we have seen in Theorem 39 that the distribution p(x) of X and 
q(y) of Y is determined from the joint distribution of X and Y. Thus it is 
possible to verify Equation 5.31, knowing only the values p(x, y). 


44 Example 


X and Y are random variables whose joint distribution is given in Table 
5.24. Determine whether X and Y are independent. 


5.24 Joint Distribution 


2 “oh .20 1 .20 


Method. We find p(x) by summing rows and q(y) by summing columns, 
then verify if p(x, y) = p(x)q(y). In fact, this is seen to be the case in Fig. 
5.25, and therefore X and Y are independent. The special case p(1,3) = 
p(1)q(3) is illustrated: .08 = (.4)(.2). 


p(x, y) = p(x)q(y) 
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Thus the independence of X and Y can be inferred from Theorem 42 as 
in Example 44. However, there are better ways, in practice, to infer the 
independence of random variables. A most useful method occurs when a 
sample space is a product space § X T and the value X(s, t) depends only 
ons, while Y(s, t) depends only on t: 


X(s,t) =X(s) Y(s,t) = Y(t) 


In this case Theorem 51 of Chapter 3 shows that X and Y are independent. 
For example, suppose that a number X is determined by the result of an 
experiment. Let the experiment be repeated (independently) twice. Let 
X, be the number determined by the first experiment and let X, be deter- 
mined by the second experiment. Then X, and_X, are independent random 
variables. This technique will be useful in Chapter 6 when we analyze, 
theoretically, the notion of independent repetitions of an experiment. 

The following theorem computes E(XY) from the joint distribution of 
X and Y. 


45 Theorem 
Let X and Y be random variables on sample space S with joint distribution 
p(x, y). (X= X1,---5Xm3 VY =YVi,--+sYn-) Then 


E(XY) => xyp(x, y) (5.32) 
E(XY) => D xyp(x, y) (5.33) 
E(XY) =) Dd xyp(x, y) (5.34) 


Proof. We first note the meaning of the equations. In each equation the 
value xyp(x, y) is computed for each possible value of x and y. Thus the 
values xyp(x, y) are found for all points in the rectangular array as in Fig. 


5.26 Rectangular Array 


Xi f+ XY P(X, Yj) 


Xm 
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5.26. Equation 5.32 gives E(XY) as the sum of all these values. Equation 
5.33 computes this sum by finding the sum for each column y(%,, xyp(x, y)), 
and then summing these results (2,2, xyp(x, y)). Similarly, Equation 5.34 
sums each row and then sums the results. Such sums as in Equation 5.32 
through 5.34 are often called double sums, because they involve sums of 
sums. 

To prove the result, we go to the definition of expectation. We have 


E(XY) = x p(s)X(s)Y(s) 


We now break this sum into mn parts, one for each choice of x and y: 


E(XY) = 2 a p(s)X(s)Y(s) 
Y(s)=y 


Each individual sum Ly_, y-, p(s)X(s) Y(s) can be computed easily: 


Y pls)X(s)¥(s) = > p(s)xy 


X(9=2xr 


Y(s)=y Yeu 
= xy >» p(s) = xyp(x, y) 
Y=y 
Thus 
E(XY)= yi xyp(x, y) 
This is the result. 


We can now prove the following theorem, which will be useful for theo- 
retical purposes in Chapter 6. 


46 Theorem 
Let X and Y be independent random variables on a sample space §. Then 
E(XY) = E(X)E(Y) (5.35) 


Proof. We let p(x) be the distribution of X and q(y) the distribution of Y. 
Then, by Theorem 42, the joint distribution p(x, y) of X and Y is given by 


p(x, y) = p(x)q(y) 
Using Theorems 45 and 19, 


E(XY) = 2 2 xyp (x, y) 
=2 (s yp(x)a()] 
= p> (vay) 2 x(x) 
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=2 yq(y)E(X) 
= E(X) 2 yq(y) 


= E(X)E(Y) 


The reader should reconsider Example 26, where it was observed that 
E(XY) # E(X)E(Y) in general. 


EXERCISES 


1. X and Y are defined according to the following table: 


Construct a table for p(x, y), the joint distribution of X and Y. Use this 
table to compute p(x) and q(y), the distribution of X and Y. Are X and Y 
independent? Explain. 


2. Let X and Y have a joint distribution p(x, y) in the following table: 


0/1/42 
-10 |.20 05 |. 
sxe] 


2010 1.101.05 


. What is the distribution of X? 

. What is the distribution of Y? 
Are X and Y independent? 

d. Find the probability that X = Y. 
e. Find the probability that X < Y. 


ae 
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f. Find E(X), E(Y), and E(XY). 
g. Find E(X2Y). 


3. If E(XY) = E(X )E(Y), are X and Y independent? Explain. 


4. In the following (incomplete) table for the joint distribution p(x, y) of 
X and Y, itis known that X and Y are independent. Complete this table. 


5. A whole number is chosen at random from 1 through 10. If it is even, 
another number is chosen, but if it is odd, the play stops. Let X be the first 
number chosen and let Y be the number of cards chosen (1 or 2). Construct 
a table for the joint distribution of X and Y. 


6. A number v7 is chosen at random between 1 and 10 inclusive. Let 
D = D(n) =the number of positive divisors of n, and let P = P(n) = the 
number of prime divisors of n. [Note that 1 is not considered a prime, so 
P(1) =0.] Construct a table for the joint distribution p(d, p) of D and P. 


7. Prove that if X and Y have joint distribution p(X, Y), then 
E(X+Y)= 2, dy (x+y)p(x, y) 


E(X°Y— Y*) = 2 > (x*y—y?)p(x, y) 


More generally, prove that if f(x, y) is any function of x and y, then 


E(f(X, Y)]= Dd fx, y)p(x, y) 


8. Prove: If X = constant and Y is an arbitrary random variable, then 
X and Y are independent. 


9. Prove: If Y = X?, then X and Y are independent if and only if Y = con- 
Stant. 


10. Prove that if the joint distribution p(x, y) of 2 independent random 
variables is arranged in the usual tabular form, then the rows are all propor- 
tional. Prove, conversely, that if the rows of the joint distribution p(x, y) are 
proportional, then X and Y are independent. 
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*7 INTRODUCTION TO GAME THEORY 


This section concerns an interesting and surprising application of probability 
to the theory of games. Although the games in this section will be compara- 
tively simple, the analysis is very similar for much more complicated games. 

Suppose that Arthur (player A) plays Bernard (player B) the following 
game: Each player chooses a head (H) or a tail (T) without knowledge of 
the other’s choice. If the choices do not match, A loses $1 to B. If they 
match (HH), A wins $2. If they match (TT) A wins $1. How should A play? 
How should B play? 

This game is simply described by the payoff matrix in Table 5.27, which 
indicates A’s winnings in all cases. (B loses the same amount, or wins the 
negative amount.) A pretheoretical analysis leads to the following thoughts: 
‘A should play H because he wins more. But then B, figuring this out, should 
play T to take advantage of A’s greed. But then A, sensing this, should play 
T, to doublecross B. But B, figuring this out, should play H. But then A 
should....” This sort of infinite outsmarting leads nowhere. How then 
should A play? 


5.27 A’s Winnings (Payoff Matrix) 


Suppose A decides to play in the most conservative manner -—that is, 
suppose A were to assume that B knows his (A4’s) move. Then, clearly 
regardless of A’s move (H or T), A will lose $1 (or win $—1). Probability 
enters the picture if A decides to choose H and T with certain probabilities. 
In this way even A will not know what his move will be (until he plays). 
For example, suppose A chooses H and T, each with probability + and 
suppose A tells this to B. (We say that A plays 3H+3T.) What does B do? 
B will presumably maximize his expected winnings (or, equivalently, he 
will minimize A’s expected winnings). Whether B decides to play T or H, 
A’s winnings are indicated in Fig. 5.28. A will win (on the average) $0 or $3 
according as B plays T or H. Thus, from a position where A must lose $1, 
he has arrived at a position where A will expect to win $0 or $1/2. B will 
play 7, of course, in this situation, and A will break even. 

A can now consider other probabilities for H and T. In general, the situa- 
tion when 4 chooses H and T with probability p and 1—p, respectively, is 
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H——-2: | 

$ 
i 1 < 1 
1:+4 —-1:-} 


B plays T; E(W) =0 B plays H; E(W) =3 
5.28 A’s winnings if A plays ;H + >T 


indicated in Fig. 5.29. [We say that A plays pH + (1—p)T.] Even if B knows 
A’s strategy, all he can do is play T or H to minimize 1—2p and —1+ 3p. 
The smaller of 1—2p and —1+ 3p is indicated by the dark, broken line in 
Fig. 5.30. It is found by graphing the 2 lines y = 1 — 2p and —1 + 3p and, for 


H——-1: -p H--—-_—-2: 2p 


<< << 
T l:l—p ‘| —l:—l+p 


B plays T; E(W) = 1—2p B plays H; E(W) =—1+3p 
5.29 A’s Winnings [A plays pH + (1—p)T] 


5.30 A plays pH + (1 —p)T; A’s guaranteed winnings 


each p, choosing the least y value. If A plays conservatively, he will choose 
p =3, thereby guaranteeing himself an expected winning of 4, regardless of 
how B plays. For any other choice of p, A can expect to win less if B plays 
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properly. Thus A plays 3H +21 and will win (on the average) $1/5 regardless 
of B’s move. (Interestingly, we can also say that A will play T, most likely, 
but plays H occasionally “to keep B honest.’ However, the above analysis 
does not ascribe any such motivations to A. A merely wants to maximize his 
guaranteed expected winnings.) 

Now the above analysis is from A’s point of view. We have seen that A 
can guarantee himself an average win of $.20 per game. What about B? He 
can reason similarly. Suppose B played most conservatively and announced 
to A that he intends to play qH+ (1—q)T. In Fig. 5.31, we can compute 
A’s winnings on the assumption that A plays H or T. If B announces his 
move, B will assume that A will maximize A’s winnings and will play T or 
H according as 1—2q or —1+3gq Is larger. We thus construct Fig. 5.32, 
which is similar to Fig. 5.30. Here, regardless of how B plays, A can win 
$1/5 on the average. But if B plays 3H +2T, A’s expected winnings will be 
$1/5, regardless of how A plays. 


——-—|: -q H—— 2: 2q 
qa 
1—q 1-q 
I——- 1:l-—gq T———1:—l+q 


A playsT; E(W) =1—2q A playsH; E(W) =—1+34 
5.31 A’s Winnings [B plays qH + (1 — q)T] 


5.32 B plays qA+ (1—q)T; A’s Maximal Winnings 


We summarize the situation. We allow both A and B to play H and T with 
any probabilities. If A plays 3H +32T, nothing B does will effect A’s winnings: 
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A will win $1/5 on the average. If B plays 2H+2T, nothing A does will 
effect A’s winnings. A will win $1/5 on the average. By introducing prob- 
ability, we have given both A and B strategies, and we see that the value of 
this game to A is $1/5. 

The theory of games considers very broad generalizations of this pro- 
cedure. We now state, without proof, a main result of that theory. Suppose 
A and B play a game in which A must choose any one of m strategies 
51,..+5Sm, While B must choose any of 7 strategies t,,...,t,. If A chooses 
s; and B chooses ¢;, then A wins a; while B wins —a,;. (The matrix [a,;] is 
called the payoff matrix of the game.) A and B are to choose their respective 
strategies without knowledge of the other’s choice. Under these conditions, 
there is a value V (called the value of the game) with the following properties: 
A can choose strategy s; with probability p; (2 p; = 1) in such a way that, 
regardless of B’s strategy, A’s expected winnings will be V or more. B can 
choose strategy ¢, with probability g; (2 q; = 1) in such a way that, regardless 
of A’s strategy, A’s expected winnings will be V or less. 

We illustrate this result with one more example. 


47 Example 
A and B play a game in which each chooses H or T without knowledge of 
the other’s choice. A wins according to the following payoff matrix: 


B loses the amount that A wins. What is A’s correct strategy? On the average, 
what can A expect to win? 


Method. \f A plays pH+(1—p)T, we can compute A’s winnings as 
follows: 

B plays H: A wins p:2+(1—p):0=2p 

B plays T: A wins p: 1+ (1—p)5=5—4p 


Graphing y = 2p and y= 5— 4p, we obtain Fig. 5.33. The graphs intersect 
where 2p = 5— 4p. Thus p=? and y = 2p = # = 1%. The dark. broken line 
indicates A’s guaranteed winnings. If A plays H with probability 2 and T 
with probability ¢ (the bluff!), A’s expected winnings will be 13—the value of 
the game. 
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5.33 A’s Guaranteed Winnings When A Plays pH + (1—p)T. 


How should B play? If he plays gH+ (1—q)T, then A’s winnings are as 
follows: 
A plays H: A wins g:2+(1—q):1=1+4q 
A plays T: A wins g-0+ (1—q) -5=5—Sgq 


A’s maximal winnings are indicated in Fig. 5.34. The graphs intersect 
when 1+q=5—5q or q=# and y=1+q= 13, as before. The dark, 
broken line indicates A’s maximal winnings. If B plays H with probability 
and T with probability 3 (B’s bluff!), A’s expected winnings will be 14—the 
value of the game. 


q 


2 
3 


5.34 A’s Maximal Winnings When B Plays qH + (1—q)T. 


It may be argued that the games considered are so childish, dull, and un- 
interesting that they do not deserve to be designated games. However, we 
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point out that these are only the simplest games and are used because they 
are capable of being analyzed simply. An argument can be made that many 
two-person, zero-sum games can be put into the form considered on page 
197. (A zero-sum game is one in which one person loses what another 
wins — there is no outside source of money that will make them “cooperate.”’) 
To see this, imagine any game (blackjack, poker, Monopoly, chess, ...). 
Then imagine writing down all strategies. These strategies tell anyone what 
you will do in any circumstance. Admittedly, each strategy will tend to be 
book length, and there will be an enormous number of strategies. However, 
once these strategies are listed (s,,..., 5m), all you do to play the game is to 
choose one strategy s;! Thus we are back in the simple situation described 
on page 197. 

For an introduction to game theory, written on an elementary level, the 
reader is referred to The Compleat Strategyst by John D. Williams (New 
York, McGraw-Hill, 1954). 


EXERCISES 


1. Compute the value of the game and A’s correct strategy if A’s payoff 
matrix is 


Also compute B’s strategy. 


2. As in Exercise 1, compute the value and strategies for the payoff 


matrix 
2 1 


3. As in Exercise 1, compute the value and strategies for the payoff 


matrix 
1 4 
3 § 


(Hint: Think about this game before writing any equations’) 
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4. As in Exercise 1, compute the value and strategies for the payoff 


matrix 
1 -—2 
2 4 
5. A and B play the following game. Each puts out 1 or 2 fingers. B pays 
A the sum of the fingers. However, if both put out 2 fingers, then A pays B 
the sum (4) of the fingers. What should A do? What should B do? How much 
can A expect to win? 


6. A and B play a game with the following payoff matrix for A: 


a. What should A do? How much can 4 expect to win? 
b. What should B do? (Hint: Look at A’s best strategy. If you were B, 
you would not play any strategy that gives A more than he deserves.) 


*8 SYMMETRY IN RANDOM VARIABLES 


In Section 4 of Chapter 4 we discussed the use of Symmetry in computing 
probabilities. We shall now show how symmetry considerations can be used 
in the computation of expected values. We recall that a transformation 
s— s' of S into S is called a symmetry if each point of S is equal to s’ for 
one and only one value of s, and if p(s) = p(s’) (cf. Definitions 23 and 24 
of Chapter 4). If X is a random variable on S, we define a random variable 
X' by the formula 

X'(s) = X(s') (5.36) 


For example, suppose 10 coins are tossed and S is the appropriate sample 
space of 2'° elements. Let s — s’ change heads to tails and tails to heads. 
(For example, HTTHH...— THHTT...) Then if X = number of heads, 
X' = number of tails. For X’(s) = X(s') = number of heads in s’ = number 
of tails in s. Similarly, if Y = excess of heads over tails, Y’ = excess of tails 
over heads. In Section 4 of Chapter 4 the fundamental result was that if A 
and A’ were corresponding sets under a symmetry, then A and 4’ had the 
same probability. We now state and prove the analogous result for random 
variables. 
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48 Theorem 


Let s— s’ be asymmetry of S, let X be a random variable on S, and let X' 
be the random variable defined by Equation 5.36. Then 


E(X) = E(X’) (5.37) 
Proof. We use the fact that as s runs through S, so does s’. 


E(X') = > p(s)X'(s) (definition) 
=> p(s)X(s’) (Equation 5.36) 
= ¥ p(s')X(s’) (s > s'isasymmetry) 


(remark above) 


I 
oM 
2s 
S 
x 
3 


= E(X) (definition) 


We now show how Theorem 48 can be exploited in the computation of 
some probabilities. 


49 Example 

Ten fair coins are tossed. On the average, how many heads turn up? 

Method. Intuitively, since a head and a tail are equally likely, we may 
expect an equal number of heads as tails. Hence, on the average, there are 
5 heads (and 5 tails). We now formalize this reasoning. 


Let X = number of heads. (We are taking the usual sample space of 2?° 
elements.) Then, if s — s’ interchanges H and T, X’ = number of tails. But 


number of heads + number of tails = 10 
or 
X+X'=10 (5.38) 
Taking expected values, 


E(X)+E(X’) = E(10) = 10 
Since E(X’) = E(X) by Theorem 48, we have 
2E(X) = 10 
E(X) =5 


Thus there are 5 heads on the average. 


50 Example 


A whole number N is chosen at random from | through 100. What is the 
average value of N? 
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Method. We can do this directly from definition. If k= 1, 2,..., 100, 
N = k with probability =5,. Thus 

E(N) = + abo “2+: +35: - 100 
ere -+ 100) 


1 100-101 


~ 700 +2 
Here we have used the formula for summing an arithmetic progression: 


_ n(n+1) 
a) 


" 


= 50.5 


1+2+---+ 


We now prove that E(N) = 50.5 by symmetry considerations. Cor- 
responding to any number n, we determine n’ = 101—n. Thus n > n'’, and 
we have 1 > 100, 2 — 99,.... This is a symmetry. Since N(n) =n, we 
have 


N'(n) = N(n') = N(101 —n) = 101—n = 101—N(n) 
Thus 
N'=101—N 
Taking expected values and using E(N') = E(N), we obtain 
E(N) = 101—E(N) 
Thus 2E(N) = 101 and E(N) = 50.5. 


51 Example 


Three dice are tossed. Let L = lowest number turning up, H = highest 
number turning up, M = middle number turning up. Prove that E(H)+ 
E(L) =7 and that E(M) = 3.5. 


Method. Before proving this formally let us note that it is intuitively 
obvious. 


L M oH E(L) E(M) E(H) 
X X x x x 
————e}——__—_—__}+—__-—_—__ +e 
l 2 3 4 5 6 | 2 3 4 5 6 
|< | |-~<——_—_——| 


5.35 Outcome of 3 Dice, with Expectations 


In the left half of Fig. 5.35 we have a possible outcome of a toss. If 
averaging occurs, we may (intuitively) expect E(L), E(H), E(M) as in the 
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right half of Fig. 5.35. Thus, since M is just as likely to fall low as high, we 
split the difference and expect E(M) = (4) (1+6) =3.5. Also, if E(L) ish 
above 1, we (intuitively) expect E(H) to be h below 6 “‘by symmetry.” 
Thus E(L) =1+h, E(H) =6—h, and therefore E(L)+E(H) =7. Thus 
the only problem is to translate intuitive symmetry into concrete (mathe- 
matical) terms. 


The trick is to change high into low by the correspondence 
123 4 5 6 
a a 
6543 2 | 


Thus, if n is a point ofa die, nm’ = 7—n. (In Fig. 5.35 this reflects points 
about the value 3.5.) Now suppose 3 dice are tossed, with sample point 
5S = (nj, No, nz). Then 

s' = (7-1, 7— np, 7— Ns) 
We have high point going into low point, middle into middle, and low into 


high. Thus 
L’(s) = L(s') = lowest of (7—1n,, 7— ng, 7 — ns) 


= 7 —highest of (n,, ne, Ns) 


L'(s) =7—H(s) (5.39) 
Similarly, 
M'(s) = M(s') = middle of (7—n,, 7—n,, 7— ns) 


= 7— middle of (n,, no, ng) 


M'(s) =7—M(s) (5.40) 


Now take expected values in Equations 5.39 and 5.40, using Theorem 48. 
We obtain 
E(L) =7-—E(A) 


E(M) =7-E(M) 
which are the results. 
We conclude this chapter with an illustration that involves a less obvious 
symmetry and yields a more interesting result. 


52 Example 


A deck of cards is well shuffled. On the average, how many cards are on top 
of the first ace from the top? (Cf. Exercise 13 of Section 4.4.) 


Method. The sample space of 52! elements is enormous. However, for 
any arrangement s we have 4 aces, which divide the rest of the deck into 
5 parts, consisting of X, cards on top of the first ace, X, cards between the 
first and second ace, etc. (See Fig. 5.36.) Now we can interchange the roles 
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5.36 Changing the Deck 


of X,, Xo,..., X; by changing the deck s into a new deck s’ as follows: Take 
all the cards on top of the first ace and put them on the bottom of the deck 
(same order) with the first ace now on top of these X, cards. A glance at 
Fig. 5.36 shows that s > s’' is asymmetry. In fact, every arrangement ¢ of a 
deck comes from one and only one arrangement s (t = s’ has only one solu- 
tion s). Also p(s) = p(s’) = 1/52! Clearly (see Fig. 5.36), 


X1(s) = X1(s') = X2(s) 
X3(s) = X,(s') = X3(s) 


X5(s) =X5(s') = X1(s) 
Thus E(X }) = E(X,), E(X 5) = E(X3), etc. But by Theorem 48 this implies 
that 
E(X,) = E(X,) = E(X3) = E(X,) = E(Xs) (5.41) 


But there are a total of 48 cards exclusive of aces. Thus 
Xit+Xo+° ° “+X. = 48 
Taking expected values and using Equation 5.41, we obtain 


5E(X,) = 48 
and, finally, 
E(X;) = = = 9.6 


Intuitively, the 48 cards split up—on the average —into 5 equal groups of 
9.6 each. 

The reader should note the common theme of Example 49 through 52. In 
all cases there were several random variables and a relationship among 
them. (In Example 52, X,, X,,..., Xs; were the random variables, and 
X,+::+:+Xs5 = 48 was the relationship.) Then there was a symmetry (whose 
existence, let it be said, comes through a combination of intuition, ex- 
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perience, and work). Finally, the symmetry induced some connection 
between the random variables, which, together with Theorem 48 and the 
given relationship, solved the problem—or as much of the problem as the 
symmetry could. 


EXERCISES 


1. If n fair coins are tossed, prove by symmetry considerations that the 
expected number of heads is n/2. 


2. If n dice are tossed, prove by symmetry considerations that the ex- 
pected number of 3’s is n/6. (Hint: Symmetry tells us that the expected 
number of 3’s = expected number of 4’s = etc.) 


3. Suppose |1 coins are tossed. By symmetry considerations, show that 
the expected excess of heads over tails is 0. 


4. One hundred people are each asked to think of a number | or 2. 
Assume that they do this independently and that | and 2 are equally likely. 
Using symmetry considerations, prove that the expected number of people 
who choose | is 50. 


5. An urn contains 9 white balls and 1 black ball. Balls are successively 
chosen at random and without replacement until the black ball is chosen. On 
the average, how many white balls will be chosen before the black ball is 
reached? 


6. Do Exercise 5 on the assumption that the urn contains 9 white and 2 
black balls. Generalize. 


7. Five dice are tossed. Let Y = largest number tossed plus smallest 
number tossed. Prove that E(Y) = 7. 


8. A deck of cards is well shuffled. Cards are dealt off the top until a 
black card appears. On the average, how many cards are dealt? (Include the 
black card.) 


CHAPTER 6 STANDARD 
DEVIATION 


INTRODUCTION 


In Chapter 5 we defined the mean » = E(X) as an important single number 
associated with arandom variable X. It was pointed out that the mean cannot 
replace X, because it gives no indication of how the values X(s) are dis- 
tributed about the mean. In this chapter we introduce the numbers V(X), 
the variance of X, and 0(X) = VV(X), the standard deviation, either of 
which is a measure of the “‘spread”’ of X about its mean. Figure 6.1 shows 
the graphs of three distributions, each with the same mean uw. It is seen 
(intuitively) that the “‘spread”’ of p,(x) is greater than that of p.(x), which in 
turn has a greater spread than p;(x). In the next section we shall learn how 
to measure this spread (standard deviation) exactly. 

Experimentally, none of the observed values of X need be near the mean 


Pp, (x) P2(x) p(x) 


ft Lage Lb 


6.1 Some Distributions 
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yw = E(X). The standard deviation will be a measure of how far the values 
of X are from pw. Since X — pw Is the deviation of X from u, it might appear 
that the average value of X¥—p, E(X— ), is a measure of the deviation. 
However, 

E(X—p) = E(X)—-E(u) =e—-p=0 


Thus X — pu is sometimes positive, and sometimes negative, and these values 
have 0 average. The simplest nonnegative measure of the deviation of X 
from pu is (X — pw)”, and this will be the basis of our definition in Section 1. 

In Section 5 we shall show how, with the help of the standard deviation, 
we can relate probability to long-term relative frequency as in our original 
motivation of probability. 


1 VARIANCE AND STANDARD DEVIATION 


If X is a random variable, we shall now consistently use the symbol yp to 
designate its mean. Occasionally, if the dependence on X is to be made 
explicit, we use “,. Thus 


E(X) =p =m (6.1) 


1 Definition 


Let X be a random variable. The variance of X, written V(X), is defined 
by the formula 


V(X) = E((X—p)?) (6.2) 
This may be written explicitly in terms of X: 
V(X) = E((X—-E(X))?) (6.3) 


As noted in the Introduction, V(X) is a measure of how much X varies 
from its mean yp. We illustrate with a numerical example. 


2 Example 
Let X be defined as follows: 


Find p = E(X) and find V(X). 
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Method. We first find w=E(X) =X p(s)X(s) = (.2)(1) + (.3)(2) + 
(.5)(4) =2.8. We then compute the value of X(s) —w= X(s) —2.8 and 
square to find [X(s)—2.8]?. We then find the variance of X, which is 
the expected value of this random variable: V(X) = E((X—2.8)?) = 
> p(s) [X(s) —2.8]?. The result is V(X) = 1.56. The computation is con- 
veniently done in Table 6.2. We shall later learn how to simplify the calcula- 
tion somewhat, although some computation will still be necessary. For a 
large table, a desk computer or slide rule will prove useful, even necessary. 
For many calculations of this kind, high-speed digital computers are used. 


6.2 Computation of V(X) 


B(=p)| 7.560 [=V(X)] 


We now prove some basic properties of the variance. 


3 Theorem 
Let X be arandom variable with mean p and let c be aconstant. Then 
V(cX) = V(X) (6.4) 
V(X+ce) = V(X) (6.5) 
Vic) =0 (6.6) 
V(X) >0 ~~“ ifX ¥ constant (6.7) 


Proof. Since E(X) = p, we have E(cX) = cE(X) = cp. Thus V(cX) may 
be found from Equation 6.3: 


V(cX) = E((cX —cp)?) 
= E(c?(X—p)’) 
= 7E((X—w)*) 
= c’V(X) 

This proves Equation 6.4. 
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We have E(X +c) = E(X)+c=prtc. Thus 
V(X+c) = E((X+c)—(ut+c))? 
= E((X—p)?) 


= V(X) 
This proves Equation 6.5. 
The proof of Equation 6.6 is a direct consequence of the definition. Since 
E(c) =c, we have 
V(c) = E((c—c)?) = E(0) = 0 
Finally, 
V(X) = 2 p(s) [X(s) —p]? 


is the sum of nonnegative terms. If X(s) is not a constant, one of the terms 
X(s) —p is not equal to 0. Thus p(s)[X(s) — yw]? > 0 for the summand, and 
the entire sum is therefore positive. This completes the proof of the theorem. 
Strictly, 6.7 is true only if all sample points have positive probability. This 
is seen by the proof, and it is in this sense that we use the inequality 6.7. 

On page 177 we noted that the significance of the equation E(cX) = 
cE(X) was that the mean wp = E(X) had the same units as X. Equation 6.4 
shows that V(X) is measured in square units. Thus, if X is measured in feet, 
pounds, or years, V(X) will be measured in square feet, square pounds, or 
square years. For example, suppose X(s) is the height of student s in feet. 
Then we claim that V(s) is measured in square feet. We can see this by 
changing units. For example, Y = 12X is the height in inches. Thus V(Y) = 
V(12X) = 144V(X), and, therefore, if we convert X to inches by multiplying 
by 12, V(X) will be converted to square inches (by multiplying by 144). A 
similar remark applies to any change of units. 

Equation 6.5 shows that V(X) is unchanged if X is changed to X + c. This, 
in turn, shows that the variance is not affected by a uniform shift in the value 
of X. In Fig. 6.3 the distributions pictured are distributions for X and 
X+c. It is reasonable that the variance, as a measure of spread, should be 
the same for both distributions. 


Pi (x) P(x) 


re wre 
6.3 Shifted Distribution 
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In order to have the same units, we choose the square root of V(X) as the 
measure of the spread of X. 


4 Definition 
The standard deviation of X, written o(X), is defined by the formula 
a(X) = VV(X) (6.8) 


We shall simply use o to designate the standard deviation of X. If the 
dependence on_X is to be made explicit, we use o , or o(X). Thus 


a(X)=o,=a o=V(X) (6.9) 


Taking positive square roots in Equations 6.4 through 6.7, we ov‘: in the 
following results: 


ao (cX) = |c|la(X)! (6.10) 
a(X+c)=a(X) (6.11) 
a(c) =0 (6.12) 
a(X) >0  ifX # constant (6.13) 


Equation 6.10 shows that o has the same units as X. This allows us to 
picture o geometrically, when the values of_X are so pictured. In Example 
2 we found that V(X) = 1.56. Thus 0 = V 1.56 = 1.25.2 The distribution of 
Example 2 is plotted in Fig. 6.4. In it, the mean and standard deviation are 


HmHVWAURKDOOS 


1 2 3 4 
6.4 Distribution with Mean and Standard Deviation 


1Ve= \c|. The equation Vc = c is not true for negative c. Thus V (—3)? = V9= 3, not —3. 
By definition, |c| = +c, where the sign is chosen to make |c| positive. 

2 Square roots can be obtained easily with the aid of a slide rule. An alternative method is to 
use Appendix C. A slightly worse way is to use a table of logarithms. There is also an algorithm, 
which nobody remembers or uses. 
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indicated. Since the mean and standard deviation have the same units as X, 
the figure is not misleading. [If, for example, V(X) were plotted as a length, 
the figure would be misleading, because the units would be wrong. ] 

A useful formula for V(X’) is obtained in the following theorem. 


5 Theorem 
Let X have mean p. Then 

o? = V(X) = E(X?) — pw? (6.14) 
or 

o =V(X) = E(X*) —[E(X) )? (6.15) 


The proof uses the definition and properties of the expectation: 
V(X) = E((X—p)?) 
= E(X?—2pX + yp) 
= F(X?) —2nE(X) 4+ pw? 
= E(X*)—2- tp 
= E(X*)— pe 
Equation 6.14 can sometimes be used to calculate the expected value of 
X?. Thus 
E(X*) =o? +w (=0,?+ pz”) (6.16) 
Equation 6.14 is often used for computational purposes. Table 6.5 shows 
the work for the distribution of Example 2. (Cf. Table 6.2.) 


6.5 Computation of V(X) 


9.4 [=E(X?) ] 
a= V(X) = E(X?) —w’ =9.4— (2.8)? = 1.56 


If X has the distribution p(x), we can find the variance directly from the 
distribution, much as we did in Tables 6.2 and 6.5. We state the formulas, 
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reviewing the formula for the mean pw. The proof is as in the proof of Theorem 
19 of Chapter 5, and is left to the reader. 


If X has the distribution p(x),x =X,,...,X,, then 
w= E(X) = 3 x(n) (6.17) 
o=V(X) = > (x;— w)?p (xj) (cf. 6.2) (6.18) 
o=V(X)= S xp (x;) — pw? (cf. 6.15) (6.19) 
If an experiment is run N times, and values x,,...,x, are observed with 
frequency n,,...,n,, then the above formulas may be used provided p(x) 


is taken to be the relative frequency of x. Thus we replace p(x;) by the 
relative frequency f;=17,/N of the result x; The numbers obtained are 
called the sample mean and the sample variance so that they will not be con- 
fused with the mean and the variance. (These latter numbers use probabilities 
in their computation: the former use observed relative frequencies.) The 
sample mean will be designated by m (sometimes X is used) and the sample 
variance will be denoted by s?. The sample standard deviation s is defined 
as the positive square root of the sample variance. The formulas for com- 
puting m and s? follow from Equations 6.17 through 6.19. We list them here 
for convenience. 

Suppose the value x; occurs n, times. Let % n;= N = total number of 
observations and f, = n;|N. Then® 


_ _ i 
m=X= Y xi = ay dD, rit (6.20) 
1 
= yim m) "f= 7D Gi m)*ni (6.21) 
_ _ 1 
f= > xPf,—m? = N > x2n;— m2 (6.22) 
If the results of N experiments are x,, X,..., Xy, it is sometimes incon- 


venient to sort out the frequencies of each observation x;. In that case we 
can use the following formulas. Suppose that N experiments yield the 
observations x1, X2,...,Xn,, with possible repetitions. Then the sample mean 
and the sample variance are given by 


12 
m=X=— yx (6.23) 
i=1 
3 Some authors use 1/(N—1) instead of 1/N in 6.21 and 6.22. The reason for this will be 
explained in Section 4. 
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1 N 
2— + _m)? 
se =a 2 (x;—m) (6.24) 
2 1 2 p2 
v= yx? —m (6.25) 


~. 
lI 
—_ 


(If like terms are combined, these equations lead to Equations 6.20 through 
6.22.) Equations 6.23 through 6.25 follow directly from the definition of the 
mean and variance, provided we regard x,,... ,x, as the N value of arandom 
variable over a uniform sample space S. For, in this case, 


_ 41. 14... 
BUX) =X P(dX(8) = By LAF 
and 
1. “v9 
V(X) = ¥ p(s) [X(s) “EXP = yD (1-8)? = 5 
We shall illustrate with two examples. 


6 Example 


The daily maximum temperature in New York City during June 1965 was 
as follows: 


Max. Max. Max. 
Date | temp. (°F) Date | temp. (°F) Date | temp. (°F) 
June | 77 June 11 81 June 21 90 
2 70 12 84 22 91 
3 70 13 68 23 93 
4 74 14 66 24 82 
5 81 15 62 25 79 
6 82 16 62 26 79 
7 88 17 73 27 79 
8 90 18 73 28 88 
9 83 19 80 29 95 


10 88 20 89 30 77 
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What is the sample mean and sample standard deviation? 


Method. The numbers are sufficiently nonrepetitive, so we shall use 
Equations 6.23 through 6.25. If x is the temperature, it is more convenient 
to work with y= x—80. This corresponds to using Y= X —80 as a new 
random variable whose sample mean is m— 80 and whose sample variance 
is the same as that of X (see Equation 6.11). We compute as in the following 


table: 


y= 
x |x—80| y’ 


—~] ~“ 
So ~ 


—~] 
—) 


poh 


oN 


\o 


oe) —_ — — No 17) — om) ~ ~~ 
p—_—_, 
ON 
\O 


EEE: 


ON 
ho 
| 
poem 
oo 


(oe) 
(oe) 
| | | | 
> bho aN — Ww om) i) — —) a) Ww 


Sum 


ON 
> 


mn 
5 


9 


—6 |2,366 


| 
my =D V= (ch) (6) =—.20 
y= x—80 x=y+80 


Thus m, = m,+ 80 = 79.80. 
$,” =~ S ye—m,? 
= 4): 2,366 —.04 
= 78.87 — .04 = 78.83 


Therefore, s,” = s,? = 78.83. 


S, = V78.83 = 8.88 

Thus, to | decimal place, 
m = 79.8° 
s = 8.9° 
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The calculations show that the average maximum daily temperature in New 
York City during June 1965 was 79.8°. The sample standard deviation was 
s = 8.9°. The residents of the city were as much interested in the sample 
mean as in the standard deviation. Note that 79.8° + 8.9° = 88.7°, a tempera- 
ture exceeded on 6 days out of 30 (relative frequency $= 20 percent). 
Similarly, 79.8° — 8.9° = 70.9°. The maximum temperature was less than this 
also for 6 days. In this case, the maximum temperature fell in the range 
m—stom-+s 60 percent of the time. Also, all temperatures fell in the range 
m—2s to m+2s. We shall have more to say about how accurately the 
standard deviation (defined in its mysterious way) serves to determine the 
range in which the values X(s) fall. 


7 Example 


In Table 1.1, in which the highest number of 3 dice was recorded, compute 
the sample mean and sample standard deviation. 


Method. The following table is self-explanatory. We use Equations 6.20 
through 6.22, because the frequencies are tabulated. The sample mean is 
4.99 (cf. Example 16 of Chapter 5) and the sample standard deviation is 1.20. 


l 
m= N S Xin = (<0) (499) = 4,99 


1 
es= N S x?n; — m? = (67) (2,635) — (4.99)? = 1.45 
s= V 1.45 = 1.20 
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EXERCISES 


1. A fair coin is tossed. Let X = 1 if H occurs and let X = 0 if T occurs. 
Compute the mean yp of X and the standard deviation o. 


2a. Two coins are tossed. Let X = number of heads. Compute the 
mean x of X and the standard deviation co. 
b. Find uw and o if X = number of heads when 3 coins are tossed. 


3. Let S={s,f} be a 2-point sample space. Suppose that p(s) = p, 
p(f)=q, where p+q=1. Let X(s) =1, X(f) =0. Compute pw and o 
for X. 


4. Arandom variable X has the following distribution table: 
x 0.2 — 
p(x) 2! .71 1 
Find «x and o for X. 


5. The grades in aclass were distributed as follows: 


Number of 
students 


Find the sample average grade and the sample standard deviation. 
6. Let X be arandom variable. Which is larger, E(X?) or E(X)?? Why? 


7. Find an expression for E((X—)*) which involves E(X?), uw, and 
o. (Hint: Use Equation 6. 14.) 


8. A factory produces precision spherical steel balls, 2 cm in diameter. 
A quality-control expert samples 100 of these balls and obtains the follow- 
ing data: 


Diameter (cm) 


Frequency 


Compute the sample mean diameter and the sample standard deviation. 
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9. Let c be a constant, and let x and ao be the mean and standard devia- 


tion of X. Prove: 
E((X—c)?) =o? + (u—-c)? 


(Hint: Use Equation 6.16, applied to Y = X —c.) 


10. The number of triple plays per season during the 1930s is given in 
the following table: 


1930: 4 1934: 3 1937: 3 
1931: 2 1935: 2 1938: 0 
1932: 2 1936: 2 1939: 2 
1933: 2 


Find the sample mean number of triple plays and the sample standard 
deviation. Do the same for the number of triple plays per season during 
the 1940s, and compare your answers. 


1940: 1 1944: 5 1947: 4 
1941: 0 1945: 0 1948: 3 
1942: 3 1946: 0 1949: 6 
1943: 0 


11. Table 5.11 gives the probability that the high number of three dice is 
1, 2,..., 6. Compute the mean high number and the standard deviation. 
Compare with the sample mean and the sample standard in Example 7. 


2 CHEBYSHEV’S THEOREM 


If a random variable X has mean pw and standard deviation a, what informa- 
tion can be deduced concerning the distribution of the values of X? Intuitively, 
we should feel that the values of X are on either side of wu, and that a small 
o ensures that the values of X are near pv with a large probability. For experi- 
mental results, a small mean standard deviation indicates that a large per- 
centage of the samples are near the mean sample m. Chebyshev’s theorem 
gives precise estimates concerning what percentage and how near. 

The distance of X(s) from the mean p is |X(s) — |. Absolute values are 
necessary to assure a positive answer regardless of whether X(s) is smaller 
or larger than yx. Thus the equation |X(s) —p| < k can be paraphrased 
‘“X(s) is within k units of yw.” Similarly, |x— | = k means “‘x is at least k 
units away from p.”’ 
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It is convenient to measure the distance of X from win units of o. Thus we 
might ask for the probability that X is within 2o of uw, or for the probability 
that X is more than 2.50 away from pw. Chebyshev’s theorem gives an 
estimate for the probability that X is within zo of mw. (z is any positive real 
number.) 


8 Theorem. Chebyshev 
If a random variable X has mean p and standard deviation a > 0, then the 
probability that |X — w| < zo is at least 1 —1/z?. 

Before proving this theorem, we give a few examples. For z= 2 we find 
that the probability that |X — | < 2o is at least 1—4 = 3. Thus Chebyshev’s 
theorem guarantees that X is strictly within 20 of w with probability = #. In 
practice, ¢ is often well exceeded, but the beauty of this theorem is its 
generality; it works for all random variables. For z= 1 we find that the 
probability that |X — | < o is at least 0 (i.e., = 0), a noninformative state- 
ment. Thus Chebyshev’s theorem gives no information for z= 1 (and 
similarly for z < 1). 

Proof. We let 

p = probability that |X(s) —p| < zo 
Thus 
1 — p = probability that |X(s) —p| = zo 


and, by definition 


p= 2 Pls) I-p= 3 p(s) (6.26) 


LX(s)—K|<2z0 [X(s)—-H| 220° 
We now use the definition of o: 
o? = > p(s) [X(s) —p}? 
By only summing some of the terms we cannot obtain a larger answer: 


o> YF  pls)[X(s)-p)? 


|X(s)-—H|2=z20 


By replacing [X(s) — w]? by its least possible value z?co”, we similarly cannot 
increase the value of the sum: 


ee YY pls)?2o7=207 SY p(s) 


IX(s)—K|2=20 IX(s)—K|2=20 
Using Equation 6.26 we have 
o* > zo*(1—p) 


Dividing by o?z”, we have 
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Finally, adding p — 1/z? to both sides of this inequality, we obtain the result: 
1 
p2\—7, 


The numbers 1—1/z? are called Chebyshev bounds. Some values are 
tabulated and graphed in Table 6.6. 


6.6 Chebyshev Bounds 


For example, taking z= 1.05, the value of X will be within 1.050 of yu 
with probability at least .100. At the other extreme, Chebyshev’s theorem 
shows that the value of X has probability at least 99.9 percent of falling 
within 31.62 standard deviations of pw. 


9 Example 

The average grade on an exam was 80 percent and the sample standard 
deviation was 5 percent. How many of the grades were between 65 and 95 
percent? Within what range is it known that at least half the grades fell? 


Method. We have m= 80, s=5S. Since 65 = 80—15 = m—3s, and 95 = 
80+ 15 = m+3s, we are looking for the relative frequency that the grade X 
differed from the mean m by less than 3s. Here z = 3, 1 — 1/z? = .889. Thus 
more than 88 percent of the exams were in the range 65 through 95 percent. 
For the second half we must find a value of z such that 1—1/z? = 4. 
Solving, we obtain z = V2 = 1.41. (This can also be read from Table 6.6.) 
Thus at least half the grades fell within 1.42s of m. Since 1.425 = (1.42)(5) = 
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7.1, we find that at least half the grades were in the range 80 —7.1 to 80+7.1, 
or from 72.9 to 87.1. 

We remark once again that Chebyshev’s theorem yields a very con- 
servative bound for the probability that |X—j| < zo. For example, in 
Example 6 we found that fully 60 percent of the observations were within 
z= 1 standard deviation of the mean. Chebyshev’s theorem guarantees 0 
percent! Nonetheless, Chebyshev’s theorem does give us information for 
z > 1. We can summarize as follows. 

On the one hand, Chebyshev’s theorem gives a universal estimate, valid 
for all random variables.* On the other hand, for a specific distribution, or 
for a special type of distribution, the estimate can be very weak. 


EXERCISES 


1. A random variable X has mean 8 and standard deviation .25. What can 
be said about the probability that XY is between 7 and 9? 


2. On some beautiful tropical island, the mean daily noon temperature 
during the year is 75° and the standard deviation is 3°. An advertising 
agency wishes to use the statement “The noon temperature here is between 
70° and 80° at least for x percent of the year.” As a consultant you are 
required to find an honest value for x. Do so. Suppose the agency says that 
your answer is ridiculously conservative and demands a more realistic 
answer. How do you proceed? 


3. In Exercise 2 the agency wants to use x = 95. Show how to do this by 
changing the extreme temperatures 70 and 80°. 


4. A random variable X has uw = 3 and o = 1. What can you say about the 
probability that X > 0? 


5. A random variable X has «7 = 100 and o=.1. Find an interval of 
numbers such that_X is in this interval with probability .99 or more. 


6. For the random variable of Exercise 5, find an estimate for the prob- 
ability that _X is between 99.5 and 100.5. 


7. For the random variable of Exercise 5, find an estimate for the prob- 
ability that _X is between 99.4 and 100.2. 


*8. Theorem 8 shows that p = 1—1/z?. By carefully going over the proof, 
prove (for fixed z = 1) that p > 1 —1/z? except for one case: X¥ = p, w— zo, 
and «+ zo with probability 1 — 1/z?, 1/2z?, and 1/2z?, respectively. 


4 We are only considering random variables on finite sample spaces, although Chebyshev’s 
theorem can be generalized to more complicated situations. 
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3 VARIANCE OF SUMS 


In Section 4 of Chapter 5 we saw how the formula E(X+ Y) = E(X)+E(Y) 
was extremely useful for finding many expectations. It turns out that the 
analogous formula for V(X) is valid for independent random variables. 


10 Theorem 
If X and Y are independent random variables on some sample space, then 


V(X+Y) =V(X)+ V(Y) (6.27) 
Proof. Let w,= E(X) and p,=E(Y). By Theorem 46 of Chapter 5, 


E(XY) = E(X)E(Y) = pyby. Also, E(X+ Y) = w+ py by Theorem 27 of 
Chapter 5. Therefore, using Theorem 5, we have 


V(X+Y) =EQ(X+Y)?) — (Mat by)? 
= E(X?+2XY+Y?) — (uy? + 2meby + by’) 
= E(X?) + 2E(XY) + E(Y?) — wy? — 2prby — by” 
= E(X*) +2ppbyt E(Y?) — we? — 2b by by’ 
= E(X?) ~ wa + E(Y?) — yy} 
=V(X)+V(Y) 
This completes the proof. 
By an exactly similar method we can also prove the result for a sum of 


any finite number of independent random variables. We shall just state the 
result. 


11 Theorem 
Let X,, Xo,..., X; be pairwise independent random variables on some 
sample space. (This means that any two of these random variables are 
independent.) Then 

V(X,4+-°°+X,) = V(X) +2 + V(X,) (6.28) 


Remark. It is the algebraic simplicity of these theorems which is ample 
justification for the use of V(X) as a measure of dispersion. 

If we translate Theorems 10 and 11 into statements about standard 
deviations, we obtain the following corollary: 


12 Corollary 


Let X,,...,X;, be random variables, any two of which are independent. Let 
a be the standard deviation of X;. Let X¥ = 2; X; and leto = 0 (X) = o(2 Xj). 


Then 
o= vV> o;” (6.29) 


STANDARD DEVIATION 223 


In particular, if X and Y are independent with standard deviations a, and 
a,, then 


a(X+ Y)=Voa,’+a,7 (6.30) 


The proof merely uses o? = V(X), o;? = V(X;). The result follows from 
Equation 6.28. 


Remark. It must be remembered that Equations 6.27 through 6.30 are 
not true in general and that independence of any two of the random variables 
was assumed. 

The value of the random variable X; will be determined in many applica- 
tions from the outcome of experiment i. If the experiments are independent, 
then the X; are independent, by Theorem 51 of Chapter 3 and Definition 42 
of Chapter 5, and the formulas may be used. 

We now illustrate these theorems with some examples. 


13. Example 


Ten coins are tossed. Find the mean number of heads and the standard 
deviation. 


Method. Asin Example 29 of Chapter 5, we let N; = the number of heads 
on coin i (=0 or 1 each with probability 3). Thus 


N = N,+:-::+N,9 = number of heads 


The N,, are independent. Therefore, setting 0, = o(N;), we have 


a(N) = Vor+: ++ +049? (6.31) 
Also, 
E(N) = E(N,) +:::+E(Nio) (6.32) 


But each N, has the same distribution p(x) given by the formulas p(0) = 3, 
p(1)=2. Therefore, E(N,) =2x,p(x;) =0-4+1-4=4, and V(N,) = 
2 (xj— w)?p(x;) =4-2+4-4=4. Thus o,2=4. Finally, using Equations 
6.31 and 6.32, we obtain 


a(N) =V10°4= V2.5 (=1.58) 
and, as in Example 29 of Chapter 5, 


14 Example 


Twenty dice are tossed. Find the average number of 6’s tossed and the 
standard deviation for the number of 6’s. 


Method. Let N;, = number of 6’s on the Ath die. Then 


20 
N=) Nx = number of 6’s tossed 
k=1 
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The N, are independent. Therefore, if we set 4, = E(N;,) and o, = o(N,), 


we have 
o(N) = Vz o,  E(N) => Ly (6.33) 


As in Example 13, each N, has the same distribution. Here the distribution 
is given by Table 6.7, in which the computation of py, and o; Is given. 


6.7 Distribution for N, 


O, = E(N;,’) — py? 


Thus p;, = %, 0,” = 3. Finally, from Equations 6.33, we find 
o(N)=V20-3%=3 E(N) =20-§= 


_ 


15 Example 
Six dice are tossed. Let A = average value of the numbers tossed. Find E(A) 
and a(A). 


Method. If X, = the number of the kth die, we have 
A =6(X,+:-:-+X¢) (6.34) 
Since the X;, are independent, we have 
a(A) =o (G(X, +--+ > +X,)) 
a(A) =¢o(X,+---+X) 


‘ o(A) =¢Vo/~+--+ +0? (6.35) 
where o;, = o(X;,). But 7, = 0, = +++ = 0%, because each X, has the same 


distribution given by p(x) =%, x= 1, 2,..., 6. It follows the E(X;,) = uw, = 
4(1+2+---+6) =% and V(X,) =o,2=6[(1—§)74+ (2-9)? +--+ (6- 
)?] = 8. 
Thus, from Equation 6.35, c=4V#%-6=3V 8 = .697. From Equation 
6.34, 
E(A) =@[E(X%1) +--+ +E(X,)] = 6-6-5 =F 


Thus A has p = 3.5, o = .697. Incontrast, X, has wy, = 3.5,0, = V2 = 1.71. 
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The smaller standard deviation is plausible if we go back to the idea of 
experiments to determine A and X,, respectively. One experiment for A is 
to toss 6 dice and average their sum. One experiment for X, is to toss 1 die 
and observe its value. It seems intuitively clear that the values obtained for 
A will not be as “spread out” as the values obtained for X. 


EXERCISES 


1. Two dice are tossed. Find the expected value and the standard devia- 
tion for the sum of the numbers that turn up. 


2. Suppose that a printer makes an average of .1 errors per page. On 
the average, how many errors will appear in a 300-page book, and with what 
standard deviation? (Assume that the number of errors on each page is 
independent.) 


3. A certain type of seed has probability .8 of germinating. One hundred 
seeds are planted. Find the expected number of germinations and the 
standard deviation. 


4. Alan, Bob, and Carl choose integers at random. Alan chooses a 
number between | and 4 inclusive. Bob chooses a number between | and 10 
inclusive and Carl chooses a number between | and 5 inclusive. Determine 
the expected value of the sum and its standard deviation. 


5. A multiple choice test has 6 questions with a choice of 3 answers, 
4 questions with a choice of 4 answers, and 10 true-or-false questions. Each 
question is worth 5 points. What is the mean grade if the answers are given 
at random? What is the standard deviation? 


6. One hundred coins are tossed, and 60 of these coins land heads. How 
many standard deviations is this away from the expected number (50) of 
heads? (Note: More than 3 standard deviations is extremely unlikely.) 


7. As in Exercise 6, 1,000 coins are tossed, and 600 land heads. How 


many standard deviations is this away from the expected number (500) of 
heads? 


8. Politicians A and B are competing for the office of Chief Nay-sayer 
in an election. They both feel that everybody will vote at random and inde- 
pendently in this election, because nobody knows or cares about the 
politicians or the office. However, they know from past experience that 
people will vote for the first name listed with probability .51 and for the 
second name with probability .49. Since 1,000 people are going to vote, 
each feels that the first name is almost surely the winner. Taking for granted 
the numerical and political assumptions of A and B, how important is it to 
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be listed first? Do the same if the election involves 100,000 people. (Note: 
A complete answer is not possible at this time, and must await Section 4 of 
Chapter 7. However, use Chebyshev’s theorem for some estimate.) 


9, If X and Y are independent random variables, prove that 
V(aX + bY) = a@V(X)+B°V(Y) 
Generalize to more than 2 variables. 


10. By Equation 6.4, V(2X) = 4V(X). But by Equation 6.26, V(2X) = 
V(X+X) = V(X) + V(X) = 2V(X). Which is wrong, and why? 


11. A gambling house has a game in which they lose $10 with probability 
.1, win $2 with probability .5, and win $.50 with probability .4. On an average 
night, 1,000 independent games will be played. Compute the house’s ex- 
pected winnings and the standard deviation of winnings. 


4 INDEPENDENT TRIALS 


Throughout the course of the text we have made references to the notion of 
‘repeating an experiment (independently) a large number of times.” The 
very notion of probability was motivated in this way, as was the notion of 
the expected value of a random variable. We shall now put the idea of re- 
peating an experiment independently into the formal framework of prob- 
ability theory and relate the notions of mean and sample mean and of 
variance and sample variance. 

In what follows we fix S as a finite sample space and let X be a random 
variable on S. It is desired to take N readings of X by repeating the experience 
for S independently N times. As we have seen in Chapter 3, the appropriate 
sample space to consider is SY = § x § X--- XS (N factors). S* consists of 
all N-tuples (s,, 5o,..., Sy) with s; € S and with p(s,,..., Sy) = p(S)° °° 
p(Sy). (S,;,..., Sy) may be regarded as a possible outcome of the sequence of 
N experiments (s, on the first experiment, etc.). 

In what follows, we let X; be the value of X determined on the ith experi- 
ment. Thus X;(8,,..., Sy) = X(s;). The underlying sample space for X, 
Xo,...,AXyis SX. 

Using these ideas we are able to conceive of repeating an experiment N 
times and reading n values x,,..., X,y of a random variable X as running one 
experiment in S” and considering the N values of the N different random 
variables X,,..., Xy. 


16 Theorem 
The random variables X,,...,Xy are independent. Each X; has the same 
distribution as X. 
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Proof. Independence of the X; means that the events A; (in $*) consisting 
of all s with X;(s) =x, are independent for all choices of x;. This is an 
immediate consequence of Theorem 52 of Chapter 3, because since 4; is 
determined by an event in the ith component of S*[X(s;) = x,]. Also, if 
p(x) is the distribution function of X, the probability that X;(s,,..., Sy) = x; 
is the same as the probability that X(s;) = x,. (This follows from Equation 
3.35.) Thus the distribution function of X; is also p(x). This completes the 
proof. 

For example, if X is the number on a die and if N = 4, then we consider 
5S‘, the sample space for 4 independent dice. X, 1s the number on the third 
die. Both X, and X have the same distribution given by p(x) = @ for x= 1, 
2,...,6. 


17 Corollary 


The mean and standard deviation of X; are the same as the mean and standard 
deviation of X: 


E(X,) =E(X) =p ao (X,) =a(X) =a V(X) = V(X) 
(kK=1,...,N) (6.36) 


This is so because X;, has the same distribution as X. 

The sample mean is simply the average of the N readings of X. We may 
therefore regard it as a random variable on S$”: Sum X,,..., X, and divide 
by N. The sample variance and sample standard deviation is similarly 
defined. 


18 Definition 
With X, X,,..., Xy as above, we define® 


¥=T(X,+ +++ Xy) (6.37) 
1. . : 
a P> (X,-X)? S=VS? (6.38) 


Remark. It is appropriate that the sample mean and variance are random 
variables on S$”. For all this means is that the value of the sample mean (and 
variance) depends upon the outcomes (s,,...,5y) in S* or, equivalently, 
upon the outcome of N experiments. 


5 Some confusion in notation occurs here. We are using S to denote a sample space and S$" to 
denote the product of S with itself N times. Now we use S? to denote the sample variance and 
S the sample standard deviation. The usage will be clear from the context. 
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19 Theorem 
With_X, , and o defined as above, we have 


E(X)=n o(X) = a0 V(X) = SVX) (6.39) 


Proof. By Equations 6.36 and 6.37, and the properties of expectations, 
we have 


Since the X; are independent, we may use Equation 6.28. Also, using 
Equation 6.4, we obtain 


V(X) = Wy . x = > Aq a: V(X;) 


i 
-4 y V(X) = <aNV(X) = V(X) 


Finally, taking square roots, we obtain 


a(X) = aud 


This completes the proof. 


Remark. We can paraphrase the result as follows: “If values of a random 
variable X (mean p, standard deviation a) are found in N independent experl- 
ments, then the mean value of the sample mean X is also wz, but the standard 
deviation of the sample mean X is o/VN.” As in the remark following 
Example 15, we expect the values of the average X to be less dispersed than 
the values of X. 

We might suppose that the expected value of S? is a”. It is not. 


*20 Theorem 
If S? is the sample variance and o” the variance of a random variable, we 


have 


o? = —_E(S?) (6.40) 


Proof. Using Equations 6.38 and 6.25, we have 


| N = N _ 
Sa yy ~X)'=5( > x¢)-% 
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We take expected values, using Equation 6.16 [E(X?) = o? + y:?]. Thus 


E(S*) = E(x a E(X) 


After multiplying by N/N — 1, we obtain the result. 


Remark. o* is a number whose value depends on how the values of X are 
distributed on the underlying sample space. S? is a random variable whose 
value depends on a point in S” and on the random variable X (i.e., on the 
outcome of N experiments, and the value of X on each experiment). The 
theorem relates o? and S*: o? = E((N/N—1)S?). If you do not know the 
mean of a random variable X and if you are allowed one experiment, yield- 
ing outcome sp and value x)= X(sq), then xy is a good guess for pw. (The 
smaller o is, the more likely it is that x» will be close to w.) When we compute 
the sample variance s? for a series of experiments, we are, in fact, finding one 
value of S? as the result of one experiment (in S¥). Since E((N/N — 1)S?) = 
o”, we consider (N/N — 1)s? to be a good estimate for o?. But 


N , N14, 2, 1 ~ 
N—1° ~N-Ind % x) “Noid *) 


For this reason many authors prefer using 


| N 
No] Dy 7)? 
i=1 


as their definition of sample variance. It is a more sensible estimate for o”. 


21 Example 


A biased coin has probability .4 of landing heads and probability .6 of 
landing tails. Fifty independent tosses are used to determine the relative 
frequency F of heads. What is the expected value of F and the standard 
deviation of F? 
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Method. Let X = 1 if the coin lands H and let X = 0 if the coin lands T. 
The relative frequency F is simply 


F = # X (number of heads in S0 trials) 


= 30(X, + X,+° ° -+ X50) 


Thus, by Equation 6.39, 
E(F)=pu (F)= 


IF 
V 50 
where E(X) = pw, 0 (X) =o. But E(X) and o(X) can be easily computed as 


in the following table from the distribution of X: 


o? = .4—(.4)? = .24,0=V.24 


Thus p = .4, 0 = V .24, and finally we obtain 


V 24 
E(F) = .4 o(F) V0 .069 
Thus we not only obtain E(F’) = .4, which we might have anticipated, 
but we also obtain the value o( Ff) = .069, which is a measure of how far the 
observed value of the relative frequency will be from .4. For example, 
using the very weak Chebyshev inequality for z = 2, we find that |F — .4| < 
.138 with probability greater than .75. 


EXERCISES 


1. One hundred people independently shuffle a pack of cards and observe 
whether the top card is a spade or not. The relative frequency of spades is 
then computed. What is the mean relative frequency? What is the standard 
deviation of the relative frequency? [Hint: Let X(s) = 1 if s is a spade, 
X(s) = Oif sis not a spade. Then X is the relative frequency. ] 


2. It is desired to have a large number of people perform the experiment 
of Exercise |. It is also desired that the relative frequency of spades should 
almost certainly be between .2 and .3. How many people should we have 
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perform the experiment? (Hint: Take “almost certainly” to mean prob- 
ability .9, .95, and .99 to obtain 3 answers. Use Chebyshev’s theorem for 
very conservative answers. In Chapter 7 we shall be less conservative.) 


*3. We found that E((N/N—1)S?) = o?. Explain why V(S2) or o(S?) 
cannot be found in terms of uw anda. 


4. Find E((1/N) =, (X;—p)?). 


5. A random variable X has mean SO and standard deviation 2. How 
many experiments should be performed so that X will have standard devia- 
tion .4? Using Chebyshev’s theorem, how many experiments should be 
performed so that X is between 49 and 51 with probability at least .9? 


6. Suppose N = 3 independent trials occur. Define M, = 3(3X,+2X,+ 
X;). Find E(M,) and V(M,). Explain why «= 43(x,+x,+4x3) is, other 
things equal, a better estimate for the mean yw than m, = §(3x,+ 2x, +3). 
(The number m, is called a weighted mean.) 


*7, Generalize Exercise 6 as follows: If M, = =%_, a,X;, where > a; = 1, 
then the choice of a; that minimizes 0(M,) is a;=1/N. [Hint: First find 
V(M,) and then set a; = (1/N) + );.] 


8. During 5 independent trials, the values of a random variable X were 
observed to be x = 5, 5.2, 5.3, 4.7, and 5. Estimate the mean and variance 
of X. 


*9, What can you say about the value of E(S)? If a sample variance s 
is found, how can you use it to estimate o? (Hint: Exercise 6, Section 6.1.) 


10. A random variable X has u = 10 and o = .2. Ten independent samples 
of X will be taken and the sample average X computed. Give an estimate for 
the probability that |X — 10] < .4. 


lla. An experiment consists of tossing 2 coins and recording the number 
H of heads. (H = 0, I, or 2). Find uw = u(H) and o = o(H). 
b. The experiment is repeated N = 10 times. Find (H) and o(H). 
c. Perform this experiment 10 times and find the sample mean m and the 
sample variance s* for your results. Compare m and uw. Compute 
(N/N — 1)s? = '?s? and see how it compares to o”. 


5 THE LAW OF LARGE NUMBERS 


We now have enough apparatus to consider what happens if the number N 
of experiments is made very large. The following theorem shows how the 
sample mean of a random variable is related to the mean if the number of 
trials is large. 


232 ELEMENTARY PROBABILITY THEORY 


22 Theorem. Law of Large Numbers 


Let X be a random variable on a finite sample space having mean yp and 
standard deviation o. Let d > 0 be given. Let py = probability that »—d < 
X < w+d. Then py > lasN > ©, 

The limit statement can be paraphrased as follows: Let d, f > 0 be given. 
Then some Ny may be determined so that if N = No, the probability that 
u—d<X < pd is at least 1—f8 It is this latter statement that we shall 
use and prove. 


Remark. We can thus say that the sample mean s will very likely be very 
close to w if a large number of trials are taken. The “very likely” is mea- 
sured by f, the “very close” is measured by d, and the “large” is measured 
by Np. If, for example, w = .3, d= .02, f= .001, then we can find N, so 
that .28 < X < .32 with probability at least .999. Also, if more than N trials 
are taken, then .28 < X < .32 also with probability at least .999. (We have 
a different X, because we choose a different N.) 

Proof. We note that the equation ~—d < X < +d means the same as 
|X — | < d. According to Chebyshev’s theorem (Theorem 7), the proba- 
bility that |¥—j| < zo/VN is larger than 1—(1/z2). (o/VN is used, be- 
cause ol VN i N is the standard deviation of X. We are applying Chebyshev’s 
theorem to XY.) We choose z so that 


d= 


VN 
Solving, we have 
_dVN 
eG 
Thus 
1 o 
I~ 1 RN 


We can arrange 1—1/z? to be at least 1—f by arranging to have 


oOo 
1-4 2 1-f 


or 
oT 
I> nN 
Multiplying by N/f, this is equivalent to 


N > TF (6.41) 


6 Note that N enters the condition w—d < X < w+d, because X is defined in terms of N. 
The notation X does not make this dependence clear. 
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If this inequality holds, then the required probability will be larger than 
1 — (1/z?), hence larger than 1 —f. This completes the proof. 

Note that the estimate is based on Chebyshev’s theorem and hence is 
very conservative. Ny is taken as any integer greater or equal to o?/d?f. 
For the example given in the above remark, suppose that o = .1. Then Equa- 
tion 6.35 prescribes 

01 
(.001) (.0004) 


Thus the conservative estimate says that if 25,000 or more trials are taken, 
the sample mean will be in the interval .28 < X < .32 with probability 
99.9 percent. (Actually, a more realistic answer is N = 275.) As we can see 
by the inequality 6.41, the number N of samples will be increased if o is 
increased or if d or f is decreased. This is the price we must pay to get 
closer to the mean and for a higher probability of attaining that accuracy. 

The above theorem relates the sample mean for large samples to the mean. 
We can now relate the relative frequency of an event to the probability of 
that everit. (Cf. Chapter 1!) The trick is to observe, as in Example 21, that 
the relative frequency is the sample average of a particularly simple random 
variable. 


N2 = 25,000 


23 Theorem. Bernoulli’s Theorem 


Let S be a finite sample space and let A C S be an event of S with proba- 
bility p(A) = p. Let F = Fy be the relative frequency of the occurrence of 
A in N trials. (F is a random variable on S”.) Let d > 0 be given and let 
Py = probability that |F(s) —p| < d. Then py > las N > , 

As in Theorem 21, this theorem can be stated without the use of limits. 
Thus, if d and f are given positive numbers, there is an No such that if 
N = No, the probability that |Fy(s)—p| < d is at least 1—f. This is the 
form in which the theorem will be proved. 

We may loosely say that the relative frequency of A will very likely be 
very close to A provided a large number of experiments are performed. 


Proof. Define the random variable Y on S as follows: 


_ fil for sin A 
Y(s) = {0 for s not in A 


Then Y is simply the relative frequency: 

Y(s) = Fy(s) se AY 
In fact, Y(s) = (1/N)[Yi(s) +--+ Yyx(s)] = (UN) [¥(s,) + ¥(s.) +°-- 
+ Y(sy)] is the number of times s; is in A, divided by N. The distribution of 


p(x) is given by the equations p(1) = p, p(0) = 1—p. Therefore, E(Y) is 
simply 2 xp(x) =1-p+0-(1—p). Thus, by Theorem 22, we can find N, 
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so that 
\Y—p| = |Fy—p| <d 


with probability at least 1 —f, for all N = No. This completes the proof. 

We conclude this chapter with two remarks. First, Theorem 22 has been 
greatly generalized to infinite probability spaces, so it must be regarded 
as a relatively weak form of the law of large numbers. Second, it must not 
be thought that Theorem 23 justifies the notion of statistical probability as a 
limit of relative frequencies, for that theorem makes a statement that some 
inequality holds with probability close to 1. But it is still framed in the 
language of probability theory and so can hardly be used to justify proba- 
bility as a limit of relative frequencies. There are stronger statements 
possible, but the fact remains that one could take a fair coin and toss it once 
a day for 10 years and it might land heads each time. No theory of proba- 
bility has changed that possibility. 


EXERCISES 


1. Compute o(Y) in the proof of Theorem 23, and in this way find the 
number N, of experiments necessary to guarantee that the relative fre- 
quency of A will be within d of the probability p(A), with probability greater 
than 1—f. 


2. In Exercise | find a number JN, that is independent of p and depends 
only ond and f. [Hint: p— p? =4— ¢—p)?.] 


3. You have a die (possibly loaded), and you want to demonstrate, ex- 
perimentally, that if it is tossed independently for a large number of times 
the relative frequency will have some limit. Therefore, you announce to the 
class: “I am going to toss this die N times and compute the relative fre- 
quency of the event 1 or 6.’ Tomorrow, I will again toss it N times and 
compute the same relative frequency. I predict that the two relative fre- 
quencies will be within .01 of each other.” As a teacher you are willing to 
have your prediction fail, but you want at least 90 percent chance of success. 
What number N do you announce? (Use Chebyshev’s estimate, and try 
to make both relative frequencies within .005 of the mean.) 


4. In Exercise 3, let F, be the relative frequency on the first day and let 
F, be the relative frequency on the second day. Let E = F,—F,. Using 
Exercise 9 of Section 3 (a = 1, b =—1), find N so that |E| < .01. Explain 
why this N is smaller than the N found in Exercise 3. 


CHAPTER 7 SOME 
STANDARD 
DISTRIBUTIONS 


1! CONTINUOUS DISTRIBUTIONS 


So far, the distributions p(x) that we have considered were defined for 
finitely many values x,,...,x,. We shall now consider distributions f(x) 
defined for all real numbers x or for real numbers x in some interval. This is 
a big step to take, so it is worth pausing to ask why we do so. 

One could argue that this step is not really necessary. After all, laboratory 
experiments effectively involve only a finite number of possibilities. Further- 
more, even if there is a continuum of possibilities (say, in determining 
the length of a piece of wire), our measurements are only good to so many 
decimal places. It would appear to be unnecessary to consider all pos- 
sibilities. For example, if a piece of wire is to be measured in a laboratory, 
if it is known that it is about 10cm long, and if measurements are accurate 
to 2 decimal places, then the variability in the measurement could be 
expressed by a distribution p(x), where x= 8.00, 8.01,..., 10.00,..., 
11.99, 12.00. However, it turns out that it is often even simpler to consider 
a continuous distribution p(x), where x is a real variable. 

Continuous and discrete mathematics each throw light on the other. For 
example, modern high-speed (digital) computers always reduce continuous 
problems to discrete ones. On the other hand, many discrete physical situa- 
tions are greatly simplified by working continuously. A chemist, for example, 
will analyze the mass of a certain amount of radium rather than the number 
of atoms present. 
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1 Example 


A real number x is chosen uniformly and at random between 0 and 3. What 
is the probability that 1 <x <2? Similarly, find the probability that 


t<x 8, 


Method. The given interval 0 < x <3 has length 3, while the interval 
| <x <2 has length |. Intuitively we would expect the probability that x 
is in any interval to be proportional to the length of that interval. There- 
fore, we expect the probability that 1 < x < 2 [written p(1 < x < 2)] to 
be 3. In general, if a and b are in the given interval from 0 to 3, then p(a < 
x < b) =3(b—a), because the interval a < x < b has length b—a, and the 
original interval from 0 to 3 has length 3. To find p(23 < x < 5), we note that 
x cannot be larger than 3. Thus p(23 S x < 5) = p(23 < x < 3). Since 3— 
3 =, we have p(23 < x <5) = (3) (2) =4. 

There is a very useful geometric way of picturing the probability p(a < 
x < b) =3(b—a). In Fig. 7.1 we draw the graph of y= 4 for 0 < x S 3. 
Then the value b—a is the length of the interval from a to b, and the proba- 
bility $(b—a) is the area under the graph y = 4 and between the values 
x=a and x= b. If we define y = 0 for x < 0 and for x > 3 (the darkened 
lines in Fig. 7.1), the area also gives the correct formula for any a and b. 
Thus the area under the graph from x = 2; to x=S is (3)(3) = %, because 
there is zero-area contribution from x = 3 tox = §. 


0) a | b | 3 


7.1 Distribution for a Uniform Random Variable Between 0 and 3 


For this example we define f(x) as follows: 


2 |B for0 =x <3 
f(x) =o for x < Oandforx > 3 


Then, when x is chosen uniformly between 0 and 3, the probability 
p(a <x < Db) is given as the area between the graph of y = f(x) and the x 
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axis, which is bounded on the left by x = a and on the right by x = b. Note 
that the total area under the graph is 1. Thus x falls somewhere with prob- 


ability 1. 
For finite distributions p(x) (x= ,,...,%,), we have two fundamental 
properties: 
p(x;) 20 (i= 1,...,n) (7.1) 
ry P(x) =1 (7.2) 


We note that for the function f(x) defined above, we had f(x) = 0, in analogy 
with condition 7.1. The area under the graph y= f(x) was 1. This is in 
analogy with condition 7.2. 

We can generalize the above example greatly to arbitrary continuous 
distributions. We first introduce a notation for area. 


2 Definition 

Let y=f(x) be a function of x such that f(x) = 0. If a < b, we define 
A,(a, b) to be the area between the graph y = f(x), the x axis, to the right 
of x = aand to the left of x = b.! (See Fig. 7.2.) 


A pla,b) 


7.2 Area Under a Curve 


In this text, areas under curves will be found directly from tables, unless 
(as in Example 1) these areas can be easily computed using elementary 
geometry. We now generalize Example 1 to arbitrary continuous dis- 
tributions. 


3 Definition 
A continuous probability density is a function f(x) such that 
f(x) 20 (7.3) 
A;(—%, +0) = 1? (7.4) 


: 7 : ; b 
1 In calculus notation this area is written | f(x) dx. 
a 


2 Thus the entire area under the curve y = f(x) is 1. 
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We say that the variable x has the probability density f(x) if 
p(a <x <b) =A,(a, b) (7.5) 


for all choices of a and b. 

In Example 1 the variable x had the probability density f(x), where 
f(x) =4 for 0 < x <3 and f(x) =0 for x < 0 and for x > 3. Sometimes 
f(x) is only defined in an interval ay < x < bo. In that case we can always 
define f(x) =0 outside that interval so that A;(—%, +) has a meaning 
for every f. Equation 7.4 then becomes A,;(do, by) = 1, because there is no 
contribution of area outside the interval. 


4 Example 

A die is tossed. If it lands 5 or 6, a number x is chosen uniformly at random 
in the interval 0 to 3. If it lands 1, 2, 3, or 4, a number x is chosen uniformly 
at random in the interval 3 to 4. Find the density function for x. Use it to 
find p(2 < x < 3.6). 


Method. ‘The density function is as in Fig. 7.3, because x 1s uniformly 
distributed in 0 <x <3 and in 3 <x <4. Since p(0 <x < 3) = (the 
probability that the die lands 5 or 6), we have 3c = 3 and c = 4. Similarly, 
ld=d=%. Thus f(x) =4 for 0< x S 3, f(x) =3 for 3< x <4. Thus 
p(2 <x < 3.6) =A,;(2, 3.6) =$+3(.6) = .111+.4=.511. 


7.3 


5 Example 
The probability distribution f(x) is given by the formulas 


__ jx 0O<x<l 
F(x) - l<x<2 
Find p(4 S x <3). 


Method. The graph consists of the 2 line segments of Fig. 7.4. Note that 
the total area under the ‘“‘curve” y = f(x) is 1. It is required to compute the 
shaded area. We use the formula K = $h(b,+),) for the area for a trapezoid 
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to obtain 
A = (3) (3)(1 +2) =3 


B= (3) (3)(1+3) =$ 
The required probability is thus 4 (¢,3) =A+B=%+4 = 33 (= .82). 


7.4 Distribution 


6 Example 


An integer N from | through 3,000 is chosen at random. Find the proba- 
bility that N is in the range 200 through 2,200. 


Method. We can compute p(200 = N 2,200) explicitly. Each integer 
has probability 1/3,000. There are 2,200—200+ 1 = 2,001 integers in the 
given range. Thus the probability is 2,001/3,000 = .667 (exactly). 

We can approximate this problem continuously. Let x = N/1,000. Then 
x= .001, .002,...,3.000, with equal probabilities. We therefore think of 
choosing a continuous variable x uniformly in the range 0 < x S 3, leading 
to the distribution of Example 1. The condition 200 < N S 2,200 is, after 
dividing by 1,000 and setting N/1,000 = x, equivalent to .2 < x < 2.2. We 
are thus led to Fig. 7.5, where A(.2, 2.2) is seen to be (2.0)3 = 3 = .667 (to 


7.5 
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3 decimal places). Thus the continuous approximation of the finite distribu- 
tion yields the correct answer to 3 decimal places and gives a rather nice 
picture of what is happening. 

Two features of this simple example will carry over to later, more com- 
plicated, situations. First, the change of scale in which N was converted into 
x = N/1,000 served to compress the values of N closer together so that a 
continuous approximation was feasible. Figures 7.6a, 7.6b, and 7.6c show 
the distributions for N, N/1,000, and x. (The first 2 are discrete, and the last 
is continuous.) The difference between Figs. 7.6b and 7.6c is very pro- 
nounced. Note that the heights 1/3,000 barely shows up in Fig. 7.6b. (These 
heights are actually exaggerated there.) The height 3 in Fig. 7.6c is 
more tangible. Why is this? In Fig. 7.6b the way to compute the required 
probability is to sum the many heights p(x) at x = .200, .201,..., 2.200. But 
in Fig. 7.6c the probability was found as an area. For a probability density 
f(x) it is the area, not the height, which is a probability. 


1/3000 - Lo 
l 2 3 4 5 6 7 8 


2,999 3,000 


7.6a Distribution for N 


1/3,000--3 


O01 2.999 7% \ 3.000 


7.66 Distribution for N/1,000 


0 | 3 
7.6c Probability Density for x 
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The second feature of this example that will carry over to other examples 
is the indeterminacy of the end points for the discrete variable N. For ex- 
ample, instead of writing 220 < N < 2,200, we could just as well write 
219<N <2,201, or 219.7 < N < 2,200.8. All these inequalities yield 
N = 220, 221,..., 2,200, because N is an integer. Since the lower bound of 
N can be as large as 220 (inclusive) or as small as 219 (exclusive), a good 
approximation will be obtained if the average 219.5 is used. Similarly, 
2,200.5 is also used. The inequality 219.5 < N < 2,200.5, after dividing by 
1,000, leads to .2195 < x < 2.2005. In this case A(.2195, 2.2005) will 
actually yield the exact answer for the probability that 220 = N s 2,200. 
The probability that N = 500 (for example) can be found similarly. Merely 
write N = 500 as the inequality 499.5 < N < 500.5 and divide by 1,000 to 
obtain the equivalent inequality .4995 < x < .5005, whose probability is 
A(.4995, .5005) = (4)(.0010) = 1/3,000. 


EXERCISES 


1. A probability density is given by the equation f(x) =4for—2 < x <2. 
Find 

a. p(l.3 <x 

c pil.3 Sx 


1.9) b. pi-1.2<x 
7) d. p(—2.6 < x 


) 


1.1 
—1.3) 


< < 
< < 

2. A probability density is given by the equations f(x) = ;for0 <x <1 
and f(x) =4for 1 < x < 3. Graph y = f(x). Find p(s < x < 8). Find A,(0, 2) 
and A,(.5, 2.5). 


3. If f(x) is any probability density anda < b < c, explain why A,;(a, c) = 
A,(a, b)+A,(b, c). What is the probabilistic significance of this equation? 
What is the analogue for a finite distribution p(x)(x = x,,...,X,)? 


4. The function y = kx 1s a probability density in the interval 0 < x < 4. 
Find k. 


5. A die is tossed. If it lands 3 or more, a number x is chosen at random 
between 0 and 1. If it lands 1 or 2, x is chosen at random between | and 2. 
Find the probability density for x. Find the probability that < x < 14. 


6. As in Exercise 5, suppose that x is chosen at random between 0 and 2 
if the die lands 3 or more, and that x is chosen at random between | and 3 if 
the die lands 1 or 2. Find the probability density for x. Using this density, 
find p( < x < 2). Similarly, find p(14 < x < 23). 
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2 NORMAL DISTRIBUTION 


One of the distributions that frequently arises in applications is the normal 
distribution. One important application of this continuous distribution is to 
approximate a discrete distribution. 


7 Definition 


The standardized normal probability density function is the function n(z) 
given by the formula 


—_ I —22/2 

n(z) = Vane (7.6) 

The normal curve y = n(z) is the familiar bell-shaped curve so well loved 
by psychologists, mathematicians, biologists, and educators. Its graph is 
given in Fig. 7.7, along with some typical values for y. Here 7 = 3.14159..., 
of geometry fame, while e = 2.71828 . . . is the number introduced in Chapter 
4. We shall never again be concerned with Equation 7.6. Its values have been 
extensively computed and tabulated, as have the areas 4,(a, b). We shall 
shortly learn how to find these areas from tables. 


7.7 Standardized Normal Curve 


Z| y FZ] y y 


(y to 2. decimal 
places, except for 
larger values of z.) 


As we can see from Fig. 7.7, the normal curve is symmetric about the 
y axis, and y rapidly approaches 0 as z approaches +~. In fact, even z= 4 
gives y = .0001, and when z= 5, y = .0000015. Even though the curve ex- 
tends to infinity, it dies down so rapidly that for most purposes we need only 
consider values z between —3 and 3. Before using the normal distribution in 
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what follows, it will be necessary to take for granted several facts about it. 
We state them here. 

1. The area under the curve y=n(z) is 1. [A,(—%,+:%) = 1.] This is 
difficult to prove and is a very deep result. 

2. The curve y = n(z) is symmetric about the y axis. Thus computations 
involving n(z) may be performed for z = 0, since the graph for z < 0 is the 
mirror image, about the y axis of the curve for z = 0. In particular, A,,(0, ©) = 
a. 

3. The values of the areas A4,(0, z) have been computed and can be ob- 
tained from Appendix D. We write 4,,(0, z) briefly as N(z). A brief summary 
is given In Fig. 7.8. 


7.8 Areas Under the Standardized Normal Curve 


4. The normal distribution has mean 0 and standard deviation 1. In this 
text we have not even defined the mean and standard deviation of a con- 
tinuous distribution, so this statement is, within the context of this text, 
meaningless. However, a definition can be given for continuous distribu- 
tions, in which the normal distribution turns out to have mean 0 and standard 
deviation 1. 

We can now define a normal distribution with arbitrary mean yx and 
standard deviation o. 
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8 Definition 
Suppose the variable x has a probability distribution f(x). Suppose also that 
wand ao > Oare two numbers such that, for alla < b, 


p(utac <x <pt+bo) =A,(a, b) (7.7) 


where A,(a, b) is the area function for the standardized normal distribution. 
We then say that x is normally distributed with mean pw and standard devia- 
tiona. 

Another way of looking at this definition is as follows. Suppose we set 


x= ptzo (7.8) 


Then the condition that wtao < x < w+beo is equivalent toa <z <b. 
The variable z, already used in the section on Chebyshev’s theorem, mea- 
sures how many standard deviations x is away from the mean. Definition 8 
can therefore be stated as follows: x is normally distributed with mean yw and 
standard deviation ao > 0 provided z has the standardized normal probability 
density. 

The relationship between x and z, and the probabilities p(putao <x < 
w+bo) and pia < z <5), are illustrated in Fig. 7.9. The standardized 
normal curve y = n(z) is shifted over and is centered over at w. The z axis 
is then magnified by an amount co, and the vertical coordinates are com- 
pressed by a like amount (thereby keeping areas the same). The figure 
illustrates p(u—o Sx Spt+2oc) =A,(—1, 2). 


. ' 
L-O q U+2o 


7.9 Relationship Between Normal Variable x and Standardized Variable z 


As a special case of Definition 8, we note that a variable z which has the 
Standardized normal probability distribution is normally distributed with 
mean «=O and standard deviation o= 1. In fact (cf. Equation 7.7), 
p(0O+ta1<z<0+b:1) =p(aszsb) =A,(a, bd). 
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Appendix D gives the values of p(0 < z < c)= Nc) for values of c. 
Example 9 shows how to compute probabilities p(a < z < b) for all values 
of a and b. 


9 Example 

A variable zis normally distributed with mean 0 and standard deviation 1. 
Find the probabilities (a) p(1.1 < z < 2.1), (b) p({z| < 2), (c) p(z = 1.6), 
(d) p(—1.2 = z S$ 2.4), and(e) p(—1 S z= — .5). 

Method. The value p(1.1 < z S$ 2.1)=A,(1.1, 2.1) is the area between 
z= 1.1 and z= 2.1 and may be found as the difference between A(0, 2.1) and 


A(0, 1.1): 
An (1.1, 2.1) = N(2.1) —N(1.1) / i 


1 21 


mh 


From Appendix D we obtain 
a. p(l.l =z <2.1)=N(2.1) —N(1.1) = .4821 — .3643 = .1178 


To find p(|z| < 2) = p(—2 S z S 2), we use symmetry to find A,,(— 2, 2). 
Using Appendix D we obtain 


b. p(\z| <= 2) =A,(—2. 2) = 2N(2) = (2) (.4772) = wy CX 
| : 


= ., 


—_ 


To find p(z = 1.6) = p(1.6 < z < &), we note that 4,,(0, ©) = .5000. Thus 
A, (1.6, ©) = .5000 — N (1.6), and 


c. p(z = 1.6) =A,,(1.6, ©) = .5000 — .4452 = .0548 ff \ 


The value of p(— 1.2 = z < 2.4) is best found by finding the areas A,,(— 1.2, 
0) and A,(0,2.4) and adding. We find the former area by symmetry: 
A, (— 1.2, 0) =A, (0, 1.2) = N(1.2). Thus 


d. p(—-1.2 Sz < 2.4) = N(1.2) +N(2.4) 
= 3849+ .4918 = .8767 r. | 
a Ss a 
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Finally, A,(— 1,—.5) =A,(.5, 1) by symmetry. Using the method in part 
a this value is N(1) — N(.5). Thus 


e. p(—-1 Sz <—.5) =N(1)—N(.5) = .3413—. 1915 ( }} 
= 1498 — ; 


at 51 
=i 


One of the important properties of a normally distributed variable is that 
the values lie close to the mean with a probability significantly higher than 
our previous conservative Chebyshev estimates. 


10 Example 

Suppose z is normally distributed, with mean 0 and standard deviation 0. For 
what value c is p(|z| < c) = .5S0. Similarly, find values c for which p(|z| < c) 
= .90, .95, and .99. 


Method. By symmetry, p(|z| < c)=p(—c <z<c)=2N(c). Thus the 
problem amounts to solving the equation 2N(c) = .50, 2N(c) = .90, etc. 

For 2N(c) = .50, N(c) = .2500, and we find from Appendix D that 
c= .67. For 2N(c) = .90, N(c) = .4500, and c = 1.65. Similarly, 2N(c) = 
.95 yields c = 1.96 and 2N(c) = .99 yields c = 2.58. Summarizing, 


p(\z| < .67) = .50 
p(|z| < 1.65) = .90 
p(|z| < 1.96) = .95 
p(\z| < 2.58) = .99 


Thus half the area under the standardized normal curve is in the range 
|z| < .67. Similarly, 95 percent of the area is in the range |z| < 1.96. Fora 
normally distributed variable x with arbitrary mean p and standard deviation 
ao, we can say that x is within .67o0 of uw with probability .50. Similarly, 
p(|x— p| < 1.960) = .95, ete. 

It is worth comparing the values p(|x—p| < co) = p(|z| <c) for a 
normally distributed variable with the bounds given by Chebyshev’s 
theorem. Chebyshev’s theorem states that p(|x—p| < zo) is at least 
1— (1/27). (We proved this for a discrete distribution attaining only a finite 
number of values x = x,,...,X,.) We have pointed out that this is a con- 
servative upper bound, because it is valid for all random variables. Let us 
now compare this estimate with actual values for a normal distribution. 

We have p(|z| < 2) = (2) (.4772) = .9544. Chebyshev’s theorem gives 

— $= .7500 as a lower bound. Similarly, p(|z| < 3) = .9974, with a Cheby- 
shev bound of 1—4 = .89. Thus, if x is normally distributed |x—p| < 30 
with probability 99.7 percent. The best that could be said of a variable x 
with any distribution is that |x— | < 30 with probability 88.9 percent. The 
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values of p(|z| < c) for a standardized normally distributed variable z 
(mean 0, standard deviation 1) are plotted for various values of c and are 
compared with the Chebyshev bounds in Fig. 7.10. 


L.Q poor rn ron te ee -= -—-- 
8 P(\z| sc) 
6 l _l = Chebyshev 
lower bound 
4 for p([z| = c) 
2 
l 2 3 c 


7.10 p(|z| < c) fora Normal Distribution 


11 Example 


The measurement d of the diameter of a precision ball bearing is known to be 
normally distributed with u = 3.40 cm and o = .03 cm. a. Find the probabil- 
ity that a measurement is larger than 3.35 cm. b. Find the probability that a 
measurement Is between 3.33 and 3.43. 


Method. Using Equation 7.8 (with d instead of x), we obtain 
d= 3.40 + .03z 
Then the condition d > 3.35 can be expressed as a condition on 2: 


d > 3.35 
3.40 + .03z > 3.35 
.03z > —.05 

z > —5/3 =— 1.67 


Since z is normally distributed (mean 0, standard deviation 1), p(d > 3.35) = 
p(z > — 1.67) = .5000+ .4525 = .9525. Similarly, the condition 3.33 <d< 
3.43 1s expressible as an inequality in z: 


3.33 <= 3.40+ .03z < 3.43 
— .07 <= .03z < .03 
—2.67/<z<1 
Thus 
p(3.33 < d < 3.43) = p(—2.67 < z S 1) = .4962+4+ .3413 = .8375 


Thus d > 3.35 with probability .9525, and dis between 3.33 and 3.43 with 
probability .8375. 
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12 Example 

A factory fills sacks of fine sand. The amount of sand is known to average 
51 lb with a standard deviation of $lb. Assuming that the amount of sand 
in a sack is normally distributed, what percentage of the factory’s output will 
weigh 50 lb or more? 


Method. The amount s of sand is normally distributed with » = 51, 0 = #. 
We wish to find p(s = 50). Using Equation 7.8 with s instead of x, we set 
s = 51+4z. Thus 
p(s = 50) = p(51+3z = 50) 
= p(3z = —1) 
= p(zZ = —2) 
Since zis normally distributed, we compute 


p(z = —2) = .5000+ .4772 = .9772 = 97.72% 


EXERCISES 


1. Suppose the variable z is normally distributed with u = 0, 0 = 1. Using 


Appendix D find 
a. p(2 =z 2.5) b. p(z = — 1.2) 
ce. p(.91 <z< 1.11) d. p(22 = 2) 


2. Suppose the variable x is normally distributed with mean 95 and 
standard deviation 4. Find 
a. p(x = 94) b. p(90 < x < 99) 
c. p(x = 105) d. p(94 < x < 96) 


3. The variable x is normally distributed with mean 10 and standard 
deviation .2. Find a range of values for x so that x will fall in this range with 
probability .9. Similarly, find a range of values for x in which x will fall with 
probability .8. 


4. The rocks in a pile have a mean diameter 15 mm and a standard 
deviation of 2mm. The rocks are put through a sieve which permits any 
rock smaller than 17.5 mm in diameter to pass through but which stops 
rocks of 17.5mm or larger. (Assume that the diameters are normally 
distributed.) What percentage of rocks pass through? 


5. The grades on an examination are normally distributed with mean 
70 percent and standard deviation 7 percent. 
a. One thousand students take the exam. On the average, how many 
students will receive a grade of 85 percent or higher? 
b. It is desired to determine a passing grade so that 90 percent of the 
students will pass. What should the passing grade be? 
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c. If 15 percent of the students should receive A’s, what grade should 
receive an A? 


6. On the average, 90 faculty members will attend a faculty meeting, with 
a standard deviation of 6. (Assume that the attendance is normally dis- 
tributed.) What is the probability that 80 or more faculty members turn up 
at the next meeting? 


7. For the faculty in Exercise 6 it is felt that no meeting should be held 
unless at least q members show up (g= quorum), lest democracy not 
prevail. (The value g is to be determined.) On the other hand, in the interest 
of efficiency, ‘as well as boredom, it is considered always desirable to hold 
faculty meetings. Thus democracy and efficiency, respectively, act to 
increase and to decrease the correct value of g. A compromise is worked 
out whereby it is agreed that the faculty shall fail to have a quorum with 
probability 75. What value of g should be used? If you want the faculty to 
meet rarely, say with probability 7, what value of g should be used? 


3 BINOMIAL DISTRIBUTION 


The binomial distribution is one of the most commonly used discrete 
distributions. We shall describe it by an example involving coin tossing. 


13. Example 
A coin has probability p of landing heads and probability g=1—p of 
landing tails. If n of these coins are tossed independently, find 

a. the distribution for the number of heads that turn up. 

b. the mean number of heads. 

c. the standard deviation for the number of heads. 


Method. We let X = number of heads. [X is a random variable on the 
space §”, where S = {H, T} and p(H) = p, p(T) = g.] Clearly, the possible 
values of X are x=0, 1, 2, ...,n. We denote the distribution of X by 
b(x; n, p) to bring the given fixed numbers n and p explicitly into the notation. 
[Thus b(x;n, p) = probability that X =x, or simply the probability of x 
heads.| We are required to find b(x; n, p) forx =0,1,...,n. But b(0; n, p) = 
probability of no heads = p(TT...T) = q", since p(T) = q, and the tosses 
are independent. To find b(1;n, p), note that one head can occur in nv 
distinct ways: HTT. ..T, THTT...T,..., TT. ..TH. Each of these ways has 
probability pg"!. Thus b(1: 7, p) = p(X =1)= probability of one head = 
npq’). 

In general, suppose that we are given an integer x (0 < x < n) and we wish 
to find p(X = x) = b(x;n, p). An outcome that has exactly x heads (and 
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n—x tails) can be found by writing a sequence of x heads and n—x tails. 
Such an outcome has probability p*g""* because we multiply probabilities 
(the tosses are independent) and each head yields a factor p and each tail a 
factor g. There are as many outcomes with x heads as there are ways to 
choose x places from among the n in which to locate the H’s. Thus there 


are (") outcomes, and the probability of x heads is 


b(x:n, p) = ("\prars (x=0,1,...,n) (7.9) 


We can illustrate for n = 4. The outcomes are 


The individual probabilities are 


q' pq p’¢’ pq p* 


The number of outcomes are 


=! ()=4 (= G4 Go 


Thus the probabilities for x = 0,1,...,4, are, respectively, b(0; 4, p) = gq‘, 
v(1; 4, p) = 4pq’, b(2; 4, p) = 6p’q’, b(3; 4, p) = 4p%q, and b(4;4, p) = p*. 

To compute E(X), the mean number of heads, and o(X), the standard 
deviation, it is convenient to use the technique of Sections 4 of Chapter 5 
and 3 of Chapter 6. We let N, = number of heads on the first toss, N, = 
number of heads on the second toss, etc. Then the N; are independent and 
each N; has the same distribution, because 


N.= 1 with probability p 
' 0 with probability q 
Thus E(N,) = p:1+ q:0 = p, while o?(N;) = E(N?)— [E(N))]? = ?p + 0q— 
p2 = p—p’. More simply, 0?(N;) = p(1—p) = pq. Then since 
xX = N, + 8 +N, 
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we have 
E(X) =E(N,)+--:+E(N,) 
w= E(X)=pt---p 
w= E(X) = np (7.10) 
Similarly, using Equation 6.28, we have 


o7(X) =o07(N,) +: ::+o07(N,) 


=pqt:::+pq 
—= npq 
Thus 
ao=a(X) = Vapg (7.11) 


Equations 7.9, 7.10, and 7.11 constitute the complete answers to questions 
a, b, and c proposed above. 

The values of b(x;n, p) have been extensively tabulated. Appendix E is 
a brief table. The following examples illustrate its use. 


14. Example 
A seed has probability .9 of germinating when planted. Six seeds are 
planted. What is the probability that at least 5 of the seeds germinate? 


Method. We regard the planting of a seed as the toss of a coin, and we 
regard “heads” as nongerminating with p = .1. Thus g = .9is the probability 
of “tails,” i.e., germination. [ Appendix E gives values of b(x; n, p) for p = .1, 
.2,...,.5. Thus we arrange for “heads” to have probability < .5.] Under 
this interpretation of a planting as a coin toss, we have n = 6 “tosses” and 
p = .1. We want the probability of 5 or 6 “tails,” so we compute the probabil- 
ity of 1 or O “heads.” This is b(0; 6, .1) + b(1; 6, .1). Going to Appendix E, 
under the column p=.1 and alongside n=6 we find b(0) = .5314 and 
b(1) = .3543. The required probability is .5314 + .3543 = .8857. 
Using Equation 7.9, we could explicitly write 


b(O) + b(1) = (.9)§ + (6)(.9)°. 1) 


However, this computation is somewhat long and the work has already 
been done in Appendix E. 
The choice of .1 for p, rather than the more natural .9, can be avoided by 
using the formula 
b(x; n, p) = b(n—x;n, 1—p) (7.12) 


In fact, the left-hand side is the probability of x heads in a toss of n coins 
where p(H) = p. The right-hand side may be interpreted as the probability 
of n—x tails in a toss of n coins where p(T) = 1 — p. Clearly these values are 
the same. More formally, we can prove Equation 7.12 directly using 
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Equation 7.9. 
b(x;n, p) = ("prc —p)"* 
and 


b(n—x; n, 1—p) = ("Ja —p)"*[] — (1 —p)|7@® 


— ("a —p)""*p* = b(n; n, p) 


proving Equation 7.12. 
In the above example, had we used p = .9 = probability of germination, 
we would find, with the help of Equation 7.12, 


b(5; 6, .9) = b(6—5;6, 1—.9) = b(1;3 6, .1) 
and 
b(6; 6, .9) = b(6—6; 6, 1— .9) = b(0, 6, .1) 


We computed b(5; 6, .9) + b(6; 6, 19) by evaluating b(1; 6, .1) + b(0; 6, .1). 

The above example shows how a process may be interpreted as a coin 
toss. Some other examples are: The birth of a child may be regarded as a 
toss of a coin, with “boy” as “heads.” If p is the probability of a boy being 
born, then b(x; n, p) is the probability that, of n newly born babies in a 
hospital, exactly x are boys. The underlying assumptions, as in our coin 
tossing, are that (a) each birth has the same probability p of being a boy, and 
(b) the births are independent. Similarly, if 8 dice are tossed, the probability 
that 2 or less 5’s are tossed is b(2; 8, ) +Db(1; 8, ) +b(0; 8, ). Here the 
toss of a 5 is regarded as “heads,” and p = 3, n= 8. 

In general, we say that we have Bernoulli trials, if a sequence of experi- 
ments is performed such that (a) the probability of success for each trial is 
the same number p, with the probability of failure equal to g=1—~p, and 
(b) the trials are independent. We can then simply state: For n Bernoulli 
trials, the probability of exactly x successes (x =0,1,...,n) is b(x;n, p) = 


N)\ x m-x 
("\pq 


This formula simplifies a bit for p= 4%. In this case g= 1—p =, and 
p*q"* = (1/27) (1/2"-*) = 1/2". Thus 
b(xn,4) = (")/2" (7.13) 


(Compare Example 25 of Chapter 2.) 
We now put these considerations in the form of a definition. 


15 Definition 


Let X = number of successes in 2 Bernoulli trials, where p = probability of 
a success on any individual trial. The distribution of X, denoted b(x; n, p) 
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or simply b(x) if n and p are understood, is called the binomial distribution 
(with parameters n and p). The term b(x) is given by the formula 


bixsmp)=(")ptg* (= O1..m) (79) 

As in Example 12, the mean yw and standard deviation o for X is given by 
j= np (7.10) 
oOo Vnpqd (7.11) 


The term “binomial” is used here because the expressions ("pears are 


precisely the terms in the binomial expansion of (q+ p)”: 


_ n - n —L pyr n 
(q+p)"=q" +nq" p+ (5a p+ ("Nar pr+:+-+p 
or 
n 
(qt+p)"= X b(x;n, p) 
r=0 


Note that since g+p = 1, we have 
1= > b(x; 2, p) (7.14) 
xr=0 


This is a special case of Equation 5.8, 


1= > p(x) 


which is valid for all distributions. 


16 Example 


A multiple-choice test has 10 problems, each with 5 choices. Assuming pure 
guesswork on the part of a student taking the exam: a. Find the probability 
that the student gets a grade of 30 percent or more. b. Find the probability 
that he gets a grade of 50 percent or more. c. Find the average grade and 
standard deviation of the grade for a test done by pure guesswork. 


Method. Regard “success” as guessing correctly. Then p=+=.2 and 
n= 10. 

a. Writing b(x) instead of b(x; 10, .2), we must find that b(3) + b(4) +: +--+ 
b(10). This is simply !—[b(0)+6(1)+5(2)], by Equation 7.14. Using 
Appendix E, under p= .2, n= 10, we find b(0) + b(1) +.6(2) = .1074+4+ 
.2684 + .3020 = .6778. Thus the required probability is 1 — .6778 = .3222. 

b. The probability of a grade of 50 percent or more is b(5)+b(6)+- +--+ 
b(10) = .0264 + .0055 + .0008 + .0001 = .0328. The fact that this figure is so 
low will not be surprising to any student who guesses a lot on multiple-choice 
tests. 
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c. The mean number of correct guesses is w=np= 10(.2) =2. The 
average grade is thus 20 percent.’ Similarly, o = Vnpq = V (10) (.2)(.8) = 
V 1.6 = 1.26. The standard deviation for the grade is 12.6 percent. 


17 Example 

It is claimed that 60 percent of the population of a large city favors condidate 
A. It is decided by his opponent, candidate B, to take a poll of 10 people at 
random to check this figure. Candidate B has decided to reject the 60 percent 
figure as ridiculous if his poll shows 4 or fewer people favoring candidate A. 
Is Ba bit rash? 


Method. Let us assume that p = .6 is the probability of favoring A. Then 
we have Bernoulli trials with p = .6, n = 10.4 Candidate B apparently con- 
siders the probability that XY < 4 ridiculously low, so low that the occurrence 
of X < 4 is enough to convince him that the assumption p = .6 is wrong. Let 
us compute p(X <4). The probability of 4 or fewer people favoring A 
is b(4; 10, .6) +.b(3; 10, .6) +---+5(0; 10, .6). By Equation 7.12 this is 
b(6; 10, .4)+b(7; 10, .4) +:--+b(10; 10, .4), which is equal to .1115+ 
0425 +---+.0001 = .1663. This is not particularly small. (It 1s about 1 in 
6.) The evidence would not generally be considered strong enough to reject 
the assumption p = .6. 

Generally, statisticians and such take a small figure, say .05 or .01 if they 
are feeling very conservative, and use it as a measure of the probability of 
an extremely rare event. If they want to test a hypothesis (e.g., the 60 percent 
figure above), they will then devise a test that can succeed only with this 
small probability, given that hypothesis. If the test succeeds, they will 
reject the hypothesis. In the above example candidate B was very rash, 
because a probability of .166 is not considered so small. 

If the 5 percent rejection level were to be used, the above figures indicate 
that B should reject the hypothesis of 60 percent for A, if 2 or fewer people 
out of 10 favor A. (Three or fewer will work if he shades 5 to 6 percent.) 

Another problem B might want solved is to estimate, with some precision, 
what percentage of the population actually does favor A. Here your intuition 
tells you that a random sample of 10 is too small to give the figure with any 
accuracy. But beware! Your intuition may also tell you that the samples 
used by national polling organizations and television polls are also too 


3 Why? Can you justify this step from 2 to 20 percent? 

4 Strictly speaking this is not quite correct. If one person is chosen, at random, and a different 
person is then chosen, the events will not be independent. If the population is 100,000 voters, 
if 60,000 favor A, and if the first person polled happens to favor A, then the probability for the 
next is, technically, 59,999/99,999. This is very close to, but not exactly, .6, and it will remain 
close for 10 trials. Thus we may regard this process as very accurately approximated by 
Bernoulli trials. 
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small. (“What,” you will say, “only 1,000 people out of 1,000,000 and they 
want me to believe their prediction!) Hold off making such intuitive guesses 
until you finish reading this chapter. 


EXERCISES 


In the following exercises, use Appendix E wherever applicable. 


1. Using Equation 7.9 or 7.13, check the accuracy of Appendix E by 
actually computing 
a. b(3;6,.5) b. b(2; 4, .2) c. b(3;4,.1) 
d. b(3;5,.3) e. b(6; 10, .15). 


2. A fair coin is tossed 10 times. What is the probability of 6 heads and 
4 tails? What 1s the expected number of heads and the standard deviation for 
the number of heads? 


3. A stamp dealer is known to send out defective stamps with probability 
.1. If 10 stamps are ordered, what is the probability that at most 1 defective 
stamp is delivered? 


4. Assume that 30 percent of the adult population of a certain city are 
college graduates. Eight of these adults find themselves (independently and 
at random) in a doctor’s office. Using Appendix E, find the probability that 
3 or fewer are college graduates. 


5. A college has accepted 700 students for admission. It knows by past 
experience that a student will actually decide to come to the college with 
probability 30 percent. What is the expected number of students who will 
come to the college, and with what standard deviation? 


6. Assume that the chance of a twin birth occurring is 3. If 200 births 
take place in a hospital during 2 weeks, what is the probability that 3 or more 
twin births occur? [Express your answer in terms of b(x; p, q).] Do not 
evaluate. Give as simple an answer as possible. 


7. A manufacturer of flashlights knows that his manufacturing process 
produces defective flashlights with probability .1. He sends out a shipment 
of 10 flashlights. What is the probability that 8 or more are not defective? 


8. Hamsters infected with a certain disease are known to recover with 
probability 20 percent. A drug company claims that it has an effective cure 
for this disease, but you, as a good scientist, are skeptical. You decide to 
test the drug on 9 hamsters. How many hamsters will have to recover for 
you to admit they have a case? (Note: Be liberal, and consider that if an 
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event with probability .10 or less occurs, you will lose your skepticism. 
Similarly, test the probabilities .05 and .01 as your measure of a rare event.) 


9, Prove: For fixed n and p, b(x+1) > b(x) forx < w—qand b(x+1) < 
b(x) for x > w—q. Thus we can roughly say that b(x) increases with x until 
x = —q, after which b(x) decreases with x. [Hint: Consider the equation 
b(x+ 1) > d(x), or b(x+1)/b(x) > 1.] 


4 NORMAL APPROXIMATION 


A glance at the graphs of various binomial distributions for large n indicates 
that the shape of each graph is remarkably similar to a normal curve. Figure 
7.11 plots b(x; 10, .3) for x =0, ..., 10 in the form of a bar graph. Figure 
7.12 plots b(x; 12, .6) similarly. In each case the similarity to a normal curve 
is striking. It is one of the remarkable results in mathematics that, for large n, 


b(x; 10, .3) 


7.11 Graph of b(x; 10, .3) 


b(x; 12, .6) 


0 12 3 4 5 6 7 8 9 10 Il 12 
7.12 Graph of b(x; 12, .6) 
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the binomial distribution b(x; n, p) can be closely approximated by a normal 
distribution with the same mean yp and standard deviation a. We shall first 
state this theorem and then show how to use it. A proof is beyond the scope 
of this text. 


18 Theorem 


For large values of n, a binomial distribution can be approximated by a 
normal distribution with the same mean and standard deviation as the 
binomial distribution. More explicitly, suppose n and p are given, deter- 
mining a binomial distribution with w = np and o = Vnpg. Then, if x, and 
X, are integers and if 7 is large, we have 


X2 
» 5(x;n, p) ~ p(x -3 <x < x, +4) (7.15) 
L=21 
Here we are using p(a < x < b) to denote the probability that the real 
variable x is between a and b, assuming that x is normally distributed with 
mean p and standard deviation o.5 

The left-hand side of the approximation 7.15 is the probability that the 
integer variable x is between x, and x, if x is distributed according to the 
binomial distribution. The significance of the values x, —4 and x, +4, rather 
than x, and x, in the right-hand side of Equation 7.15 is explained on page 
241. The approximation is illustrated in Fig. 7.13, in which the graph of 
b(x; 16, .5) is given along with the graph of the probability density of a 
normally distributed variable with the same mean pw =8 and standard 
deviation 0 = 2. The probabilities 31°, b(x) and p(33 < x < 104) are both 
illustrated as areas. 

In practice, the approximation formula 7.15 works even when z is not too 
large, say as small as 10. The precise statement of Theorem 18 is given by a 
statement involving limits. One form of this is 

>> b(x;n, p) > A(%, 22) asin —> 0 
W+210S527<SN+220 
where A (Z,, 2) is the area under the normal curve y = n(z) between z= z, 
and z = %. 


19 Example 


Forty percent of the population of a city is opposed to city parades. Ten 
people are sampled. What is the probability that between 3 and 5 of the 
people polled (inclusive) oppose parades? Approximate the answer using 
the normal distribution. 


5 For the remainder of the text we shall use the notation p(a < x < b) only for a normally 
distributed variable x. As usual, z will denote a normally distributed variable with mean 0 and 
standard deviation 1. 


258 ELEMENTARY PROBABILITY THEORY 


7.13 Comparison of Binomial and Normal Distribution 


Method. We can assume a binomial distribution with n = 10, p = .4. (See 
the footnote on page 254.) The required answer is clearly b(3)+ b(4)+ B(5), 
where we are writing b(x) = b(x; 10,.4). Using Appendix E, the answer 
is .2150+.2508 + .2007 = .6665. 

In this example the normal approximation procedure is far longer than 
this direct method. However, it illustrates the procedure well. We have 
n=10, p=.4. Thus »=np=4, and o = V(10) (.4)(.6) = V2.4 = 1.55 
(to 2 decimal places). The required probability that 3 < x < 5 (x an integer) 
is replaced by the probability p(2.5 < x < 5.5), where x is real and normally 
distributed with uw = 4, 0 = 1.55. Setting x = 4+ 1.55z, the inequality 


2.55sxs 5.5 


becomes 
4+ 1.55z = 5.5 
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The required probability, using Appendix D, is 
2N (.97) = (2) (.3340) = .6680 


a difference of only .0015 from the exact answer, .6665S. 
The accuracy of the normal approximation has thus been illustrated for 
the rather moderately sized n = 10. 


20 Example 


Two hundred fair coins are independently tossed. What is the probability 
that the number of heads is between 90 and 115 inclusive? 


Method. We have a binomial distribution with n = 200, p = 4. Thus wp = 
np = 100, and o = Vnpq= V50=7.07. The required probability that 
90 < x < 115 is approximated by p(89.5 < x < 115.5), where x is normally 
distributed with » = 100 and o = 7.07. Setting x = 100+7.07z, we have the 
following equivalent inequalities: 


89.5<x < 115.5 
89.5 = 1004+7.07z < 115.5 
—10.8 < 7.07z < 15.5 
<7z=2.19 
But 
p(—1.49 < z < 2.19) = N(2.19) + N(1.49) = .4857+ .4319 = .9176 


Thus the required probability is approximately .9176. 


21 Example 


A fair die is tossed 100 times. What is the probability that the number of 
6’s which turn up is 15 or more? 


Method. The number x of 6’s is governed by a binomial distribution 
with n= 100, p=4%. Thus uw = np = 16.67 and o = V(100)(O® = (2)V5 = 
3.73. The probability that x is 1S or more is b(15) + b(16) +: - -. The normal 
approximation is simply p(14.5 < x < ~).® Setting x = 16.67+3.73z, the 
inequality 

1445<x 
becomes 
14.5 < 16.67+3.73z 
—2.17 < 3.73z 
—.58 <z 


Using Appendix D, p(z = —.58)=.7190. Thus the required probability 
is approximately .7190. 


6 Note that 100.5 is the proper upper limit. This is over 22o larger than wu and can therefore be 
replaced by ~. 
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22 Example 


A fair coin is tossed 100 times. Find a number WN such that the probability 
of N or more heads is approximately .90. 


Method. Here n= 100, p=. Thus np = 50, o = V(100)(5) G) = 5S. The 


probability of N or more heads is b(N)+b(N+1)+--: = p(N—3 S x). 
Set x = 50+ 5z. We have the following equivalent inequalities. 


N-i sx 
N-4 S 50+5z 
N — 50.5 S §z 
N —50.5 
= 
Thus 
p(N-4 <x) =p(A—2 <2) =4, (“2,2 


We wish this latter probability to be .90. We therefore try to find a value c 
such that 
A(c,%) = .90 


From Appendix D, c = — 1.28. Thus we set 


ois sane 


N—50.5 = — 6.40 
N = 44.1 


Since N must be an integer, we choose N = 44. The probability that x = 44 
will be slightly larger than .90. 
A somewhat different way of proceeding is as follows: From Appendix D 
we find that 
A(— 1.28, ©) = .90 


43 43.6 44 : 
43.5 ——————_ probability = .90 
k— probability of 44 or more heads 


7.14 Choosing N 


SOME STANDARD DISTRIBUTIONS 261 


Thus 90 percent of the area under the normal curve is to the right of — 1.28. 
Translating this fact to a normal distribution with ~ = 50 and o = 5, 90 
percent of the area under this curve is to the right of ~— 1.280 = 50— (5) 
(1.28) = 43.6. But, referring to Fig. 7.14, the probability that x = 44 is 
approximately the area to the right of 43.5, which is somewhat larger than 
.90. Thus we choose N = 44. 


EXERCISES 


Where appropriate, use the normal approximation to the binomial dis- 
tribution. 


1. A coin is tossed 100 times. What is the probability that the relative 
frequency of heads is between 45 and 55 percent inclusive? Find this proba- 
bility if the coin is tossed 200 times. Similarly, find the probability if the 
coin 1s tossed 500 times. 


2. In Example 1 of Chapter 1, 3 dice were tossed 100 times. It was ob- 
served, experimentally, that the high number on the dice was 5 in 33 per- 
cent of the experiments. In Example 6 of Chapter 2 we found that the 
probability of a high of 5 was 28.2 percent. Find the probability that in a run 
of 100 experiments, the high of 5 will not occur between 24 and 32 times, 
inclusive. 


3. A die is tossed 120 times. What is the probability that a 5 turns up 
between 15 and 25 times, inclusive? For what range of values can you pre- 
dict that a 5 will turn up with probability .90 or more? .95 or more? .99 or 
more? 


*4. A coin is known to be fair. It is desired to toss this coin N times. 
Choose WN so that the relative frequency of heads will be between .49 and 
.51 inclusive, with probability .95 or more. Similarly, find N if the relative 
frequency of heads is to be between .48 and .52 with probability .98 or 
more. 


5. A plant seed is known to germinate with probability .90. If 100 seeds 
are planted, what is the probability that 87 or more seeds germinate? 


6. In Exercise 5 it is essential to have 100 or more germinated seeds. 
How many seeds should be planted to guarantee this with a 99 percent 
probability? 


7. Assume that 6,000 people in a town favor candidate A and 4,000 
people favor candidate B. A poll is taken of 20 people. (The people are 
chosen independently and at random.) What is the probability that 10 or 
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more people polled favor candidate B? What is the probability that 15 or 
fewer people favor A? 


8. Each student in a class of 30 is told to toss 5 coins and report whether 
or not 4 heads and | tail occurred. Each student performs this experiment 
3 times, for a total of 90 experiments. What is the probability that between 
12 and 16 occurrences, inclusive, of a 4-head and 1-tail split are reported. 


9. A person claims that he can guess the suit of a card chosen at random 
from a deck. You test him 50 times, each time carefully shuffling the deck. 
Assuming that his claim is false, and that the person guesses correctly with 
probability 4, what is the probability that he guesses correctly 18 or more 
times? Find an N so that the probability of N or more correct guess is about 
10. 


10. A multiple-choice test has 100 questions on it, each with 5 choices. 
Assuming random answers, what is the probability of achieving a grade of 
25 percent or higher? Set a grade that will automatically fail 98 percent of 
the guessers. 


11. In Exercise 10 suppose that all the questions have two answers which 
are obvious nonsense and which are eliminated before guessing randomly 
on the other 3. What is the probability of a grade of 40 percent or higher? 
Of 50 percent or higher? If someone gets a grade of 53 percent on this exam, 
present an argument proving that the person knows something about the 
subject. 


12. Do Exercise 8 of Section 6.3 using the normal approximation to the 
binomial distribution. 


5 STATISTICAL APPLICATIONS 


One of the current myths about statisticians is that “statistics can prove 
anything.” Another opposing myth is that “‘statistics can prove nothing.” 
We now consider some statistical applications of the results of this chapter 
to learn more about the applications of probability theory. In all cases the 
idea behind these methods is that if our assumptions show that an event 
has a very small probability but nevertheless occurs in an experiment, 
then we may have cause to reject our assumptions. 


23 Example 

You are going to test a die to see if it is honest. You decide to test whether a 
6 turns up about ¢ of the time, as it should if the die is honest. You toss the 
die 600 times, independently, and you observe that a 6 turned up 120 times. 
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You are undecided. Is 120 too large, or is 120 reasonably close to the ex- 
pected number (100) of 6’s? 


Method. We have no business performing an experiment and waiting for 
the results before deciding what to do with them! However, this is a typically 
human error, so let us see what we might have done. 

Prior to the experiment you decide to toss the die 600 times. Assuming 
p = (the probability of a 6), you have a binomial distribution with w= 
np = 100 and o = Vanpq = V (600) (@) (8) = V 83.3 = 9.13. Therefore, you 
expect the number x of successes to be “close” to w= 100. If x is too far 
from 100, you will reject the assumption that p = 4. 

We must now decide how far x should be from 100 in order to reject our 
assumption that p = @. Let us find an interval about 100 large enough so 
that x will fall in this interval with probability 95 percent = .9500. (The 
figure 95 percent is an arbitrary decision on our part.) According to 
Appendix D, N(1.96) = .4750 = 473 percent. Thus 4,(— 1.96, 1.96) = .9500, 
and 95 percent of the area under the normal curve falls between z = — 1.96 
and z = 1.96. In terms of an arbitrary normal curve, we have p( w— 1.960 < 
x < w+1.96c) = .9500. In our case wp = 100, o = 9.13. Since 1.96 x 9.13 = 
17.9, we have p(82.1 < x < 117.9) = .9500. The probability that the num- 
ber x of 6’s is between 82 and 118 is approximately p(81.5 < x < 118.5), 
hence greater than .95. We have every right to expect x to be between 82 
and 118, because the probability that this occurs is slightly larger than .95. 
Thus x < 82 or x > 118 with very small probability (smaller than .05). 
Therefore, our test is as follows: If x is not within the interval 82 < x < 118, 
we shall reject the hypothesis that p = §. In that case we say that we reject 
the hypothesis p = @ at the 5 percent significance level. 

In particular, because 6 occurred 120 times in our test, we will say that 
the die is unfair. In doing so we admit that the die may be fair and that an 
event of small probability may have happened. If we want to be very con- 
servative in our rejection, we may decide to reject at the 2 percent, or even 
1 percent, level. In this case the number 1.96, which satisfied p(|z| < 1.96) = 
.95, is replaced by 2.33 [p(|z| < 2.33) = .98] or by 2.58 [p(|z| < 2.58) = 
.99], and we shall reject the hypothesis only when x is outside appropri- 
ately wider limits. 


24 Example 

Thirty hamsters have a disease from which they ordinarily recover with 
probability .4. A company claims that their remedy will increase the recovery 
rate to .8. Devise a test to decide on the company’s claim. 


Method. Suppose we feed the hamsters the medicine and see what happens. 
But first, we shall make up a test, considering only the two alternatives 
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p = .4 (that the remedy has no value at all) and p = .8 (as claimed by the 
manufacturer). —_ 

1. If p=.4, n=30, then w= 12, o= Vapg = V7.2 = 2.68. Thus, if 
x is the number that recover, and if x is “much” larger than 12, we shall 
reject the hypothesis that p= .4. (Note: We are deciding to consider 
whether x is much larger than 12 rather than whether x is very far away 
from 12, above or below.) Let us choose the 5 percent significance level. We 
find c with N(c) = .4500. In that case p(z < c) = .9500 and p(z > c) = .0S. 
Using Appendix D we find c = 1.65. Thus, for a normally distributed x, 
p(x > w+1.65a) = p(z > 1.65) = .05. For w = 12, o = 2.68, we have w+ 
1.650 = 12+ 4.43 = 16.63, and therefore p(x > 16.63) = .05. If x is the 
number of recoveries, the probability that x = 18 is approximately p(x = 17.5), 
hence smaller than .0S. Thus, at the 5 percent significance level, we reject 
p = .4ifit turns out that x = 18. 

2. If p=.8, as claimed, n = 30, then p = 24 and o = V4.8 = 2.30. If x 
hamsters recover, where x is “much” less than 24, we shall reject the hypoth- 
esis p = .8. Choosing the 5 percent significance level, we find as above that 
p(z < —1.65) = .0S. (Thus 95 percent of the time z will be greater or equal 
to — 1.65.) Since uw = 24, o = 2.30, the value z=—1.65 corresponds to 
x = 24—(2.30)(1.65) = 20.20. Thus p(z < 20.20) = .05. If x is the number 
of recoveries, the probability that x < 19 is approximately p(x < 19.5), 
hence smaller than .05. Thus at the 5 percent significance level, we reject the 
hypothesis p = .8ifitturns outthatx < 19. | 

We summarize our test as follows. We feed the remedy to the hamsters 
and count the number x of recoveries. If x = 18, we affirm that p # .4. (The 
medicine has some effect.) If x < 19, we shall assert that the claim p = .8 is 
false. In each case we admit that even if p = .4 or if p= .8 there will be a 
5 percent probability that we are wrong. See Fig. 7.15 for a graphical de- 
scription of this test. 

What happens if, say, only 6 hamsters survive? We then suspect that the 


be pe 
(if p = .4) (if p = .8) 
0 12 15 18 21 24 27 30 
-——_———_——— ir 
————_— 


9; srejectp = 4 if xis 
reject p = .8 if xis in this range 
in this range 


7.15 ATest 
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medicine has a negative effect. We go back to the drawing boards and devise 
a new test! Our procedure must always be determined first to prevent after- 
the-fact wishful thinking that might wrongfully color our thinking. 

Our last examples show how to estimate population. We first show that, 
for a large sample, the observed relative frequency F of success is approxi- 
mately normally distributed. 


25 Theorem 
Let n Bernoulli trials be performed, where the probability of success on each 
trial is p. Let F be the relative frequency of success. (F is arandom variable.) 
Then 

bp = E(F) =p (7.16) 


Or =o(F) = Vpq/n (7.17) 


Furthermore, if n is large, the distribution of F may be approximated by a 
normal distribution with mean p, and standard deviation oy. Thus a < F 
< b with probability approximately p(a = f = b), where f is normally dis- 
tributed with mean p and standard deviation V pg/n. 

Proof. \f X is the number of successes, we have 


F=ly 
n 
Thus, using Equations 7.10 and 7.11 for the binomial distribution, 
_ —~p(ly\a! _! 8 
=E(F) = E(5X)=18(X) =1-mp=p 


Or =a(F) =o(7X) =*9(X) : Vnpq = V pq/n 


Finally, let us find the probability that a = F < b. Since F = (1/n)X, this 
inequality is equivalent to a < (1/n)X < b ot or na = X < nb. If we set x= 
np + zV npq, this latter inequality has probability approximately 


V —np-3 b—npt+3 
na—% <np+zVnpq < nb+3) = (" np 2<7<h 7 
. ° m yep v npq V npq 


where z has a standard normal distribution. Since n is large, we can ignore 
the small summands + (3/V npq), and the probability is approximately 


na—np _ nb— Vane) = _ 
zs a<snp+Vuapqz S nb) = 
oars Vnpq P Pa 


p(aspt+Vpgq/nzsb 


Thus the probability that a < F < b is approximately p(a S$ p+ V pq/nz < b), 
which is precisely the probability p(a < f < b), where f is a normally dis- 
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tributed variable with mean p and standard deviation V pq/n. This completes 
the proof. 


26 Example 

An experiment has probability .3 of success. Approximately how many 
times should the experiment be performed in order that the relative fre- 
quancy F of success be between .28 and .32 with probability .9? 


Method. F has mean p = .30. The condition .28 < F < .32 may be written 
|F — .30| < .02. According to Appendix D, |z| = 1.65 with probability .9. 
Thus, for any normal variable f, |f—| < 1.650 with probability .9. We 
therefore arrange for 1.650 = .02, and we will have | f—.30| < .02 with 
probability .9. But a = V pq/n = V (.3)(.7)/n. Thus we choose v so that 


1.650 = .02 
1.65V .21/n = .02 


(1.65)?(.21) _ gps 
nh 


(1.65)?(.21) 

n= a 7 ee = 1,429.3 
Thus we may choosen = 1,430. 

Before going to our last example, we give a simple result that will save us 


some work later. 


27 Theorem 


For a binomial distribution with parameters n and p, the largest standard 
deviation possible is 4Vn, and it occurs when p = 3. Similarly, the relative 
frequency F of success has standard deviation at most $V 1/n. 

Proof. The standard deviation for the binomial distribution is Vnpq = 
Vnp(1—p). Therefore, we wish to find the largest value of y = p(1—p) 
(0 < p < 1). The graph of y = p— p? is given in Fig. 7.16. It is seen there 
that the largest value occurs when p = 3, for which p(1—p) =4. Thus 0 = 
Vnp(1—p) is always less than or equal to Vn: 4= AV/n. Similarly, V pq/n 
is at most AVI In. This is the result. A more algebraic proof is outlined in 
the Exercises. 


28 Example 

You are polling a city of 1,000,000 voting adults to determine whether they 
favor candidate A or candidate B. You poll 500 such people (independently 
and at random) and you find that 280 favor A and 220 favor B. What an- 
nouncement can you give to the press? 
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25 


0 » l 
7.16 Graph of y=p—p 


Method. The methodology is wrong. First, you plan the experiment and 
then you decide what to do with your results. Here is a plan. 

If p is the fraction of adults favoring A and q is the fraction favoring B, 
our poll may be thought of as n = 500 Bernoulli trials. The relative frequency 
F of those favoring A has mean p and standard deviation V pq/500. Accord- 
ing to Theorem 27, the standard deviation a; is at most $V 1/500 = $V .002 
= .02236. 

Let us decide on a 5 percent significance level. According to Appendix D, 
p(|z| < 1.96) = .95. Thus any normally distributed variable is within 1.960 
of x with probability .95. Since fis approximately normally distributed with 
mean p, we shall reject any value of p if the observed value of fis not within 
1.960 of p. Since oy, is at most .02236, we can use the larger number 
(1.96)(.02236) = .0438 instead of 1.960;,. 

In our case the observed value of f was 280/500 = .560. We therefore 
reject all values of p outside the range .560+ .0438. Thus, the possible 
values of p are between .516 and .604. If p were outside this range, then the 
observed frequency f= .56 would be “too far” from p. We have thus found 
a 9S percent confidence interval for p: 51.6 percent < p S 60.4 percent. 
We do not say that p is in this interval with 95 percent or more probability. 
The true value of p is fixed and is not a random variable. However, we do 
say that if p were not in this interval, an event occurred that had probability 
smaller than .0S. 

We can thus issue this statement to the press: “We claim that between 
51.6 and 60.4 percent of the voters of this city favor candidate A. Our polling 
company Is accurate more than 95 percent of the time.” 
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The man in the street sees only n = 500 and a population of 1,000,000. 
Therefore, he feels that the sample is far too small. As a pollster, you see a 
possibility that p < .516 or p > .604, but you realize that if this were so, the 
results of your poll were a fluke that could occur only with probability 
smaller than .05. The man in the street should concentrate on the assumptions 
of independence and randomness, because these are the factors that are most 
difficult to achieve. In 1936 a magazine took a famous poll predicting Landon’s 
victory over Roosevelt. They polled people at random from a telephone 
book, thereby eliminating a substantial (pro-Roosevelt) population that had 
no telephones. The mathematical techniques of this chapter, as well as much 
more refined and sophisticated techniques, are well known to pollsters. They 
know, however, that it is not as easy to obtain randomness and independence 
as it is to define these concepts. 


EXERCISES 


1. Prove, algebraically, that the largest value of y = p—p? is y=4 and 
that this occurs only at p = 3. (Hint: Let p = 3+-x. Then find y as a function 
of x.) 


2. Devise a test to decide, at the 5 percent significance level, whether a 
coin is fair or not. Do this for 10 tosses, 100 tosses, and 500 tosses. 


3. As in Exercise 2, devise a test to decide at the 10 percent significance 
level whether a coin is fair. Use 100 tosses. Similarly, find a test at the 2 
percent significance level. 


4. Suppose a poll of 300 people shows that 180 people like brand X and 
120 people detest brand X. Assuming that these 300 people were chosen 
independently and at random from a population of 10,000, find a 95 percent 
confidence interval for the percentage of people in the town who like brand 
X. 


5. As in Exercise 4, find a 90 percent confidence interval and a 98 
percent confidence interval. 


6. The Weather Bureau in a certain city predicts rain with probabilities. 
(They might say that there is a 50 percent chance of rain.) Some people say 
that these are actual probabilities, but some say that the Weather Bureau is 
merely avoiding responsibility. It is therefore decided to check if these are 
accurate probabilities. For the next 50 predictions of “rain with probability 
50 percent” it is observed whether it rained or not. Devise a test that might 
reject the assumption (at the 5 percent significance level) that the Weather 
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Bureau knows what it is talking about when it predicts rain with probability 
50 percent. 


7. An experiment consists in tossing 2 coins. How many experiments 
should be performed if the relative frequency of 2 heads is to be between .23 
and .27 inclusive with probability 98 percent? 


8. As in Exercise 7, how many experiments should be performed if the 
relative frequency of 2 heads is to be .24 or higher, with probability 95 
percent? 


9a. It is claimed that a coin has probability .2 or less of landing heads. 
You are willing to toss the coin 25 times. Devise a test to determine, at 
the 10 percent significance level, whether to reject the assumption 
p = .2. 
b. Similarly, devise a test to reject the assumption p = .2. 


10. One thousand people are sampled at a certain hour and it is found that 
120 of them resented being polled. Find a 90 percent confidence interval for 
the percentage of people who resent being sampled at that hour. (Assume 
that the 1,000 people were selected independently and at random. This 
implies that 1 person could be sampled more than once—a dangerous pro- 
cedure for people who are irate at being polled.) 


11. A manufacturer sends a company a large shipment of nails. He 
claims that 20 percent of the nails are made of aluminum. The company 
president chose 150 nails at random and found only 20 aluminum nails. 
Granted that the president should have devised his experiment first, should 
he claim that he did not get the right amount of aluminum nails? 


12. In Exercise 11 find a 95 percent confidence interval for the percent- 
age of aluminum nails in the shipment. Use the method of Example 28. 


*6 POISSON DISTRIBUTION 


The Poisson distribution is a distribution which arises so often in practice 
that tables of its values have been extensively tabulated. Our approach is 
to treat it as the limiting value of a binomial distribution as n — ~ and p — 0, 
where np = A, some fixed number. 


29 Theorem 
Let A be a fixed number and let x = 0 be an integer. Then 


lim b (x; n, *) = A*eA/x! (7.18) 
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Proof: Let p = A/n. By definition, 


ee 
~ x"(n—x)! qe 4 
x'(n—x)!n* q* n 
x n ! 
=*(1-*) mtd (7.19) 
x! n) (n—x)'!n* q* 
Asn — &, we have [1 —(A/n)" > e~, by Equation 4.6. Furthermore, 
n! _n(n—l)-++(n~xt+1)_nn—-1 n-xtl 
n*(n—x)! n™ non n 


= 1 (1) eee (1-=—) 
n n 
Since x is held fixed and each factor approaches | as n — ©, we have 


! 
mos) — | asn—-? © : (7.20) 


Also, q7 =[1—(A/n)]7 ~ 1 as n— ©, because x is held fixed. Hence, 
taking limits in Equation 7.19, we obtain 


AT _y 
x! € 


. nN Ae _ 

Him b (x n,) =e (1)(1) = 
This is the result. 
30. ~=Definition 


The Poisson distribution, with parameter A, is the distribution p(x; A) de- 
fined for x =0,1,...,n,... by the formula 


p(x;A) =e" (7.21) 


For small values of x, we have 
p(O;’) =e 
p(1;\4) =aAe™* 


-yy aM Ha 
p(2;A) 7 e 
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Equation 7.18 merely states that b(x;n, A/n) > p(x;A) asn— ~, 
Thus we have 
b(x;n,A/n) = p(x;A) if nis large (7.22) 


Equivalently, setting p = A/n, we have 
b(x;n,p) ~p(x;A) (A = nap) (7.23) 


(if n is large, p is small, and A = np is moderately sized.) 

The approximation 7.22 is seen to be useful if Bernoulli trials are repeated 
a large number of times, but the probability p of a success on each trial is 
small. In that case the probability of x successes is easily approximated in 
Equation 7.23 by using the Poisson distribution p(x; A) with \ = np. The 
values of p(x; A) are tabulated for various values of \ and x in Appendix F. 


31 Example 

The probability of winning a certain game is .01. a. If you play the game 150 
times, what is the probability that you will win exactly twice? b. What is the 
probability that you will win 2 or more times? 


Method. a. Here p= .01,n = 150. We assume that the games are indepen- 
dent. The required answer is b(2; 150, .01). Here A = np = 150 X .01 = 1.5. 
By Equation 7.23 the Poisson approximation is 


b(2; 150,.01) = p(2; 1.5) 


Referring to Appendix F, under the column A} = 1.5 we find p(2; 1.5) = 
.2510. This is the required answer. [Note: Actually p(2; 1.5) = [(1.5)?/2]e?.. 
Appendix F merely evaluates this number. ] 

b. It is easier to compute the probability of 1 or 0 wins and take comple- 
ments. Thus b(1; 150, .01)+ b(0; 150, .01) ~ p(1; 1.5) +p(O; 1.5) = .3347+ 
.2231 = .5578. Thus the probability of 2 or more wins is approximately 
1 — .5578 = .4422. This completes the problem. 

Note that we have assumed that p(x; A) is, for fixed A, a distribution. This 
requires that 


p(0;A) +p(sA) +2 +p(aa)+o°=¥ plasa)=1 (7.24) 


[An infinite sum is required because p(x; A) is defined for all nonnegative 
integers, not just finitely many.] But this equation is true, because by Equa- 
tion 4.5 we have 


p(0;A) +p(1;A) +:+-+p(n;A)+--- 


2 
= e>+)e*+—e*4+-+-4+—e%4::- 
2! n!} 


4 2 An 
=e “{1+rAtmt:+-+t—4+---JH=et-e=1 
2! n! 
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Equation 7.24 was implicitly used above in part b. We wanted p(2; 1.5)+ 
p(3;1.5)+-:-. This is p(0;1.5)+p(1; 1.5)4+p(2; 1.5)+-+++—p(O; 1.5)+ 
p(i; 1.5) = 1—[p(O; 1.5) + pC; 1.5)]. It was this number that we computed 
in part b. 

Which would you rather play —a game with probability .01 for 150 times, 
or a game with probability .001 for 1,500 times? In each game the expected 
number of wins is 1.5. Theorem 29 shows that not only do the 2 have the 
same means, but they also have, approximately, the same distributions. In 
each case the probability of x wins is approximately p(x; 1.5). The Poisson 
distribution is determined by | parameter, A, unlike the binomial distribu- 
tion, which needs 2: and p. 

Since \ = np is the mean of the binomial distribution, it is natural to say 
that the mean of the Poisson distribution is its parameter A. Similarly, o = 
V npg = Viq =WVd (because g = 1—(A/n) = 1 if nis large). Therefore, it 
is also natural to say that the standard deviation of the Poisson distribution 
is VA. Both of these results may be proved directly using extensions of 
Equations 5.15 and 6.18 for infinite sums. 


32 Example 

An insurance company has insured 10,000 homes against fire. Assuming 
that the probability of fire in a house during a year is .0002 and that all fires 
are independent, find the probability that the company will have to pay off 
on 4 or more homes during a year. 


Method. Here n= 10,000, p = .0002, so A = np = 2. The probability of 3 
or less fires during a year is p(0;2)+p(1;2)+p(2; 2)+p(3; 2) = .8571. 
Hence the required probability is 1 — .8571 = .1429. 

One of the standard uses of the Poisson distribution 1s for continuous pro- 
cesses of acertain type. Example 33 gives one such application. 


33 Example 

A Geiger counter is placed near a radioactive substance in such a way that 
it registers an average of 10 counts per minute. (Each count occurs when an 
emitted particle hits the Geiger counter.) During a fixed minute, compute 
the probability that 7 or more counts occur. 


Method. Clearly we cannot proceed unless some physical and mathe- 
matical assumptions are made. (Ultimately these assumptions are tested 
by experiment.) It appears that particles are emitted randomly. Let us 
divide the minute into a large number x of equal parts. We shall assume that 
the process of emitting particles is a random procedure and that the prob- 
ability of a particle hitting the Geiger counter in any one of these small 
periods of time is p. (p is small.) We also assume that hits on any 2 of these 
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small time intervals are independent. We also assume that the time intervals 
are so small that 2 hits in one of these intervals cannot occur, or happen so 
infrequently that we can ignore the possibility. In that case we have approxi- 
mated this process by a binomial distribution with parameters p and n. (A hit 
is regarded as a success.) The hypothesis is that np = 10, so the distribution 
of the number x of hits (total hits during a minute) is approximated by the 
Poisson distribution p(x; 10). Here we expect a better approximation as 
n— , because presumably the possibility of 2 hits in an interval, if the 
time interval is divided more finely, will become more and more negligible 
as n— ©, It is therefore reasonable to regard p(x; 10) as exactly equal to 
the distribution for the number of hits in | minute. 

The required answer is therefore p(7)+p(8)+-:::, where we have 
written p(x; 10) = p(x). Since p(O)+---+p(6) = .1302, from Appendix F, 
the required probability is 1 — .1302 = .8698. 

The method of this example applies to continuous processes in which a 
finite number of successes occur in a given time. As Example 33 shows, we 
require independence of success in different time intervals and a negligible 
probability of simultaneous successes. In these cases the probability of x 
successes in a given time interval is p(x;A), where A is regarded as the 
average number of successes. For, as before, we may break up the given 
time intervals into a large number of pieces so that the probability of a 
success in each of these intervals is the same value p. Ignoring the pos- 
sibility of 2 successes in any | of these intervals, or a success at the end 
points of these intervals, we have n Bernoulli trials with a small probability 
p of success on each trial. Letting n — ~ and assuming that np = d in all 
cases, we have a Poisson distribution in the limit. We shall not consider the 
precise conditions that a continuous process must satisfy to be a Poisson 
distribution. 

Some examples where a Poisson distribution might apply (in the absence 
of other evidence) are the number of telephone calls in an office from 
2 P.M. to 5P.M. (A = the average number of calls in this period), the number 
of accidents in the United States during the hours 12 noon through 12 mid- 
night on a Saturday (here, a multiple accident, such as a pileup, must be 
counted as only | accident to avoid too many simultaneous “successes’’), 
and the number of wrong notes played by a pianist during an hour’s con- 
cert (although independence of the occurrence of wrong notes might 
depend upon the performer’s experience and personality). 

The reader is cautioned against assuming that any continuous random 
process that yields a finite number of successes is governed by a Poisson 
distribution. In the examples above we do not expect a Poisson distribution 
for the number of office calls if the office has only 1 telephone and a very 
talkative secretary. Can you explain why? 
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EXERCISES 


Where appropriate, use the Poisson approximation to the binomial 
distribution. 


1. Assume that 8 percent of the population is colorblind. If 50 people 
are chosen at random, what is the probability that at most | is colorblind? 


2. Using Appendix F, sketch the graphs of y = p(x; 2) and y = p(x; 4). 


3. A marksman has probability .01 of hitting the bulls-eye of a target. 
Assuming independence of trials, what is the probability that he hits the 
bulls-eye at I¢ast once in 50 shots? How many shots should he take so that 
he will hit the bulls-eye at least once, with probability .90 or more? 


4. A manufacturing process produces a defective part with probability 
.001. A quality-control engineer will test 3,000 of these parts. He wants a 
number WN such that the probability of N or more defectives is smaller than 
OS. Find N. 


5. Assume that the probability of a twin birth is .01. During the next 250 
deliveries at the hospital, what is the probability of 2 or more twin deliveries? 


6. Prove: p(x +13) = (A/x+1)p(x; A). 


7. For fixed A, determine the value or values of x at which p(x; A) is as 
large as possible. (Hint: Use Exercise 6.) 


8. Prove: 
P(0; Ax) p (0; Ag) = p(O; Ay +Ag) 


p(0,A1)p(1s Az) + p(0; Az) p (15 Ay) = p(13 Ay + Ag) 


and, in general, 


i 
X P(ns dy)p(x— M3 Aa) = p(x; Ay +g) 
n= 


9. Algebra textbooks show that there can be no construction for the tri- 
section of an angle using straight edge and compass in certain specified 
ways. Despite this, every year a few “constructions” are published that 
prove incorrect in one detail or other. 

a. Give an argument in favor of the hypothesis that the number of such 
incorrect constructions per year has a Poisson distribution. Also give 
an argument against this hypothesis. 

b. Assume that the number of published faulty constructions has a 
Poisson distribution, with an average of 6 constructions per year. What 
is the probability that, next year, no angle-trisection constructions are 
published? 
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From DIFFERENTIAL AND INTEGRAL CALCULUS, Houghton Mifflin. Copyright ©, 
1960, by James R. F. Kent, pages 470-473. 
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APPENDIX B 
VALUES 


Adapted from DIFFERENTIAL AND INTEGRAL CALCULUS, Houghton Mifflin. 
Copyright ©, 1960, by James R. F. Kent, pages 470-473. 
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VALUES OF e” 


9.4877 
9.5831 
9.6794 
9.7767 
9.8749 


9.9742 
10.074 
10.176 
10.278 
10.381 


10.486 
10.591 
10.697 
10.805 
10.913 


11.023 
11.134 
11.246 
11.359 
11.473 


11.588 
11.705 
11.822 
11.941 
12.061 


12.182 
12.807 
13.464 
14.154 


14.880 
15.643 
16.445 
17.288 
18.174 


19.106 
20.086 
21.115 
22.198 
23.336 


24.533 
25.790 
27.113 
28.503 
29.964 


298.87 
314.19 
330.30 
347.23 
365.04 


383.75 
403.43 
424.11 
445.86 
468.72 


492.75 
518.01 
544.57 
572.49 
601.85 


632.70 
665.14 
699.24 
735.10 
772.78 


812.41 
854.06 
897.85 
943.88 
992.27 


1043.1 


1096.6 
2981.0 
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SQUARE 
ROOTS 


MmetwWN — 


1.0000 
1.4142 
1.7321 
2.0000 
2.2361 


2.4495 
2.6458 
2.8284 
3.0000 
3.1623 


3.3166 
3.4641 
3.6056 
3.7417 
3.8730 


4.0000 
4.1231 
4.2426 
4.3589 
4.4721 


4.5826 
4.6904 
4.7958 
4.8990 
5.0000 


5.0990 
5.1962 
5.2915 
5.3852 
5.4772 


5.5678 
5.6569 
5.7446 
5.8310 
5.9161 


3.1623 
4.4721 
5.4772 
6.3246 
7.0711 


7.7460 
8.3666 
8.9443 
9.4868 
10.000 


10.488 
10.954 
11.402 
11.832 
12.247 


12.649 
13.038 
13.416 
13.784 
14.142 


14.491 
14.832 
15.166 
15.492 
15.811 


16.125 
16.432 
16.733 
17.029 
17.321 


17.607 
17.889 
18.166 
18.439 
18.708 


18.974 
19.235 
19.494 
19.748 
20.000 


20.248 
20.494 
20.736 
20.976 
21.213 


21.448 
21.679 
21.909 
22.136 
22.361 


22.583 
22.804 
23.022 
23.238 
23.452 


23.664 
23.875 
24.083 
24.290 
24.495 


24.698 
24.900 
25.100 
25.298 
25.495 


25.690 
25.884 
26.077 
26.268 
26.458 


8.4261 
8.4853 
8.5440 
8.6023 
8.6603 


8.7178 
8.7750 
8.8318 
8.8882 
8.9443 


9.0000 
9.0554 
9.1104 
9.1652 
9.2195 


9.2736 
9.3274 
9.3808 
9.4340 
9.4868 


9.5394 
9.5917 
9.6437 
9.6954 
9.7468 


9.7980 
9.8489 
9.8995 
9.9499 


26.646 
26.833 
27.019 
27.203 
27.386 


27.568 
27.749 
27.928 
28.107 
28.284 


28.460 
28.636 
28.810 
28.983 
29.155 


29.326 
29.496 
29.665 
29.833 
30.000 


30.166 
30.332 
30.496 
30.659 
30.822 


30.984 
31.145 
31.305 
31.464 


Reprinted from ELEMENTS OF FINITE PROBABILITY by J. L. Hodges and E. Lehmann, 
by permission of the publisher, Holden-Day, Inc., page 219. 
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APPENDIX D 
AREAS UNDER THE NORMAL CURVE 


The table entry is N(z), the area under the standard normal 
curve from 0 to z. The area from z to ~1s .S5000—N(z). The area 
from —zto zis 2N(z). 


From Mosteller, Rourke and Thomas, PROBABILITY WITH STATISTICAL APPLICA- 
TIONS, 2nd ed., 1970, Addison-Wesley, Reading, Mass., page 473. 
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1 .1800 3200 4200 .4800 —_.5000 

BINOMIAL 2 .0100 .0400 .0900 1600 .2500 
DISTRIBUTION |? ° 720 3120 3430, 2160-1250 
1 2430 3840 .4410 4320 ~—.3750 

2 0270 .0960 .1890 .2880 _.3750 

b (x: n, p)= n p*q" 3.0010 .0080 .0270 .0640 1250 
X 4 0 6561 .4096 .2401 .1296 .0625 
1 2916 .409% 4116 3456 2500 

2 0486 1536 .2646 .3456 3750 

3 0036 0256 0756 .1536.2500 

4 0001 .0016 0081 0256 .0625 

5 0 .5905 3277 1681 0778 ~—-«.0312 

1 3280 .409% 3602 2592-1562 

2 0729 2048 3087 34563125 

3.0081 .0512 .1323 .2304_—«3125 

4 .0004 .0064 .0284 .0768 .1562 

5 0000 0003 0024 0102 .0312 

6 0 .5314 .2621 1176 0467 .0156 

1.3543 .3932 3025 «1866 —-.0938 

2 (0984 2458 324] 3110 .2344 

3.0146 0819 1852 .2765 3125 

4 .0012 .0154 .0595 .1382 .2344 

5.0001 0015 .0102 .0369 —_.0938 

6 .0000 0001 .0007 .0041 .0156 

7 0 .4783 .2097 0824 .0280-—-.0078 

1 3720 3670 .2471 1306 .0847 

2 1240 2753 3177 .2613——«164] 

3 0230 1147 2269 +.2903-—«.2734 

4 .0026 0287 .0972 .1935  .2734 

5.0002 .0043 .0250 .0774 164] 

6 .0000 .0004 .0036 .0172 .0847 

7.0000 0000 .0002 .0016 —_.0078 

g 4305  .1678 .0576 0168 -—-.0039 
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From Brunk, AN INTRODUCTION TO MATHEMATICAL STATISTICS, 1960, Ginn 
& Co., pages 363-365. 
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BINOMIAL| » x «1 2 3 4 5 
DISTRIBUTION 
in) og x19 0.3874 .1342 0404 = 0101 .0020 
B(x; n, p) = (*) p q 1 3874 3020 1556 .0605 0176 
2 1722 3020 2668 ~=—.1612_-—-.0703 
3.0446 1762-2668 ~=—«.2508 1641 
4 .0074 .0661  .1715 .2508 2461 
5 0008 .0165 0735 1672-2461 
6 .0001 .0028 0210 .0743 1641 
7 0000 0003 0039 .0212-—-.0703 
8 0000  .0000 0004 .0035 —.0176 
9 0000  .0000 .0000 .0003 —_.0020 
10 0 .3487 1074 0282 .0060-—.0010 
1.3874 2684 1211-0403 -.0098 
2 1937 .3020 .2335 1209 .0439 
3.0574 2013. .2668)=—.2150 1172 
4 .0112  .0881 .2001 .2508 ~—.2051 
5 0015 0264 1029 2007-2461 
6 .0001 .0055 0368 1115 .2051 
7 0000 .0008 0090 0425. .1172 
8 0000 .0001 .0014 .0106 .0439 
9 0000 .0000 .0001 0016 —.0098 
10.0000 .0000 .0000 0001+ .0010 
11 0 3138 ©0859-0198 .0036-—-.0005 
1.3835 2362-0932 0266-0054 
2 2131 .2953 1998 0887 .0269 
3.0710 .2215 .2568 +1774 .0806 
4 0158 1107 2201 .2365 1611 
5.0025 0388 1321. .2207-—«.2256 
6 .0003 .0097 0566 1471 — .2256 
7 0000 0017 .0173  .0701_—.1611 
8 0000 .0002 .0037 .0234 .0806 
9 0000 .0000 .0005 0052 .0269 
10.0000 .0000 .0000 .0007—_.0054 
11.0000 .0000 0000 .0000-—.0005 
12 2824 0687 0138 + .0022-.0002 
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S55 
S55 
335 
355 
55 
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BINOMIAL 
DISTRIBUTION 


b(x;n, p) = (") p*q" 


0 
1 
2 
3 
4 
5 
6 
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APPENDIX F 
POISSON 
DISTRIBUTION 
p(x;A) =A*e/x! 


YHAUNRWNHKO 
YNWNRWNK OS 


x\{| 15 20 25 30 35 40 45 50 60 7.0 80 90 10 x 


QO |.2231 .1353 .0821 .0498 .0302 .0183 .0111 .0067 .0025 .0009 .0003 .0001 .0000) 0 
1 |.3347 .2707 .2052 .1494 .1057 .0733 .0500 .0337 .0149 .0064 .0027 .0011 .0005| 1 
2 |.2510 .2707 .2565 .2240 .1850 .1465 .1125 .0842 .0446 .0223 .0107 .0050 .0023 | 2 
3 |.1255 .1804 .2138 .2240 .2158 .1954 .1687 .1404 .0892 .0521 .0286 .0150 .0076 | 3 
4 |.0471 .0902 .1336 .1680 .1888 .1954 .1898 .1755 .1339 .0912 .0573 .0337 .0189 | 4 


5 1.0141 .0361 .0668 .1008 .1322 .1563 .1708 .1755 .1606 .1277 .0916 .0607 .0378 | 5 
6 |.0035 .0120 .0278 .0504 .0771 .1042 .1281 .1462 .1606 .1490 .1221 .0911 .0631 | 6 
7 }.0008 .0034 .0099 .0216 .0385 .0595 .0824 .1044 .1377 .1490 .1396 .1171 .0901 | 7 
8 |.0001 .0009 .0031 .0081 .0169 .0298 .0463 .0653 .1033 .1304 .1396 .1318 .1126}| 8 
9 0002 .0009 .0027 .0066 .0132 .0232 .0363 .0688 .1014 .1241 .1318 .1251 | 9 


10 0002 .0008 .0023 .0053 .0104 .0181 .0413 .0710 .0993 .1186 .1251 | 10 
11 0002 .0007 .0019 .0043 .0082 .0225 .0452 .0722 .0970 .1137 | 11 
12 0001 .0002 .0006 .0016 .0034 .0113 .0264 .0481 .0728 .0948 | 12 
13 0001 .0002 .0006 .0013 .0052 .0142 .0296 .0504 .0729 | 13 
14 0001 .0002 .0005 .0022 .0071 .0169 .0324 .0521 | 14 
15 0001 .0002 .0009 .0033 .0090 .0194 .0347 {15 
16 0003 .0014 .0045 .0109 .0217 |16 
17 0001 .0006 .0021 .0058 .0128 |17 
18 0002 .0009 .0029 .0071 {18 
19 0001 .0004 .0014 .0037 |19 
20 0002 .0006 .0019 |20 
21 0001 .0003 .0009 |21 
22 0001 .0004 | 22 
23 0002 |23 
24 0001 |24 


Based on Brunk, AN INTRODUCTION TO MATHEMATICAL STATISTICS, 1960, 
Ginn & Co., pages 221, 371-374. 
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Answers to Selected 
Odd-Numbered 
Exercises 


CHAPTER 1 


Section | 

1. f, = 119/600 = .198; f, = 203/600 = .338; f. = 278/600 = .463. 

3. 1945: fy = 1,467/2,858 = .513, fr = 1,391/2,858 = .487; 1946: fy, = 514, 
fr = .486; 1947, 1948, 1949: fy, = .513, fp = .487. The large samples 
involved indicate that it is somewhat more likely that a male is born than a 
female. 

5a. 51, 52, 53, 54, 55, Sg, Where s; 1s the outcome “‘high die is i.” 

7a. Hy, H,,..., Hi, where H; is the outcome ‘‘exactly i heads are tossed.”’ 

9a. There are infinitely many possible outcomes: H,, H,, H3,..., and Hy. 
H, is the outcome “‘the first head occurs on the ith trial,’ and H,, is the 
event ‘‘a head never turns up,” which is a theoretical possibility. 


Section 2 
li. Not a probability space since .1+.2+.3+.4+.5 ¥ 1.0. 
ii. A probability space. 
iii. A probability space. 
3. 1/S. 
5. p(A) = and p(B) =p(C) =§. 
7i. No, since statistical probabilities are limits of f, as N tends to infinity. 
ii. Yes, because the p(s;) satisfy the conditions 1.4 and 1.5. 
9. 1/39. 


Section 3 

1. A, — 0, A, — {X}, As — {Y}, A, = {Z}, A; — {X, Y}, Ag — {X, Z}, 
A,={Y, Z},andA,= {X, Y,Z} =S. 

3. The event is A = {2,4, 6}. p(A) = .624. The relative frequency of A in 
the first 100 trials was .61. The relative frequency in the second 100 trials 
was .63. 

5. The event is A = {7,11}. p(A) = .223. The relative frequency of A was 
.244. p({6, 7, 8}) = .445. 

7. The event is {9, 19, 29, 39, 49, 59, 69, 79, 89, and 90 through 99}. The 
probability is 19/100 = .19. 

9a. 3/7 
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e. .10 
13a. n(A) =3. p(A) = .8 
b. .5 
ce 2% p(s) 
s € {x,y,z} 
Section 4 
i 
1. a5 = .306 I 2 3 4 5 6 
ft | fea fe 
[| | fee | 
yt | fee | 
4 (40 /4,2 (4,3) | (4,4) | (4,5) | (4, 6) 
s} ff fool | 
sf ff fool | 
3. 20/36 = .556 
5a. 16/32 = .5 
b. 8/32 = .25 
7. 11/32 = .344 


9. The sample points are HHH, HHT, HTH, THH, HTT, THT, TTH, 
and TTT. The probabilities are 4, 3, 3, and 3 respectively. 

Ila. 1/3 
b. 2/5 

ce. 1/3 

d. 3/10 

e. 1/6 

13. The probabilities are respectively 1/36, 10/36, and 25/36. To three 
decimal places, these are .028, .278, and .694. 


Section 5 
1. p(Manhattan) = 1,698/7,782 = .218 
p(Bronx) = | ,425/7,782 = .183 
p(Brooklyn) = 2,627/7,782 = .338 
p (Queens ) = 1,810/7,782 = .233 


p(Richmond) = 222/7,782 = .029 
3. There are 25 prime numbers between | and 100 inclusive. 
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CHAPTER 2 


Section | 
1. p= 6/16 = .375 


HH| HHHH HHHT HHTH HHITT 
HT} HTHH HTHT HITTH HTTT 
TH | THHH THHT THTH THITIT 
TT | TTHH TYTHT TITH TTT 


3. The probability that both are green is 42/341 = .123. The probability 
that both have the same color is 162/341 = .475. 

5. n(three letter words) = 26° = 17,576. n(three letter words of the type 
consonant-vowel-consonant) = 21 - 5-21 = 2,205. 

7. Let A ={N, 5S, E, W} (the points of a compass), and let D = {F, L, R} 
(forward, left, and nght). Then, Paths = A x DX DX DXD. n(Paths) = 
4(3*) = 324. The man will be unable to stroll on a different path every day 
for a year. First path = ELRRF. Second path = WFRFR. 

9. There are 3° = 27 different ways of answering, and there are 30 students. 
Therefore, some tests will be identical. 

Ila. 74 = 2,401 
b. 73 = 343 

13. It is possible to produce the same amounts in different ways. For exam- 
ple, 3¢ = 1¢ + 2¢. This was not possible in Exercise 12. 


Section 2 
1. 3/7 = .429 
3. 20: 19- 18/23 - 22-21 = .644 
5. 12-11-10°9/12- 12-12-12 = .573 
7. 6°5°4°3+2/6 = 5/54 = .093 
9a. 4/22,100 = .0002 
b. 44/22,100 = .002 
c. 1,096/22,100 = .050 
d. 720/22,100 = .033 
e. 52/22,100 = .002 
f. 3,744/22,100 = .169 
g. 16,440/22,100 = .744 
11. pot pitpot+p3 = 1, and p, = pe, Po = Pz = 2/17. Therefore, 1 = 2p,+ 
2p. = 4/17 + 2p.. Solving for p., we find p, = 13/34. 
13a. 1,260 
b. 780 
c. 660 
d. 1,100 
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e. 60 
f. 114 


Section 3 


pom 


er Pe mee ao Sf 


720 

336 

20 

. 117,600 

220 

74/81 

~ 230,300 

5 

720 

70 

~ 12/11! 

. 17/10!7! 

. 191/11'8! 

~ 495 

. 171 

9a. Population = the 26 letters; size is indeterminate; ordered sample with 
replacement. 
b. Population = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}; size 4; ordered sample with 
replacement. 
c. Population = 52 cards; size 5; can be ordered or unordered without 
replacement. 
d. Population = {1, 2, 3, 4, 5, 6}; size 8; ordered or unordered with re- 
placement. 
e. Population = {H, T}: size 10; ordered or unordered with replacement. 
f. Population = all students in the graduating class; size 5; ordered sample 
without replacement. 
g. Population = all baseball players; size 9; ordered or unordered without 
replacement. 
h. Population = all American League teams; size = number of teams in 
the American League; ordered sample without replacement. 
i. Population = all National league teams; size = number of places in the 
first division; unordered sample without replacement. 


Coo 
“Imo S&" ® 


11. The number of hands is 2,598,960 = (**). The probability is (‘) / 


5 
52\ _ 
(5) = 0005. 


13. 30! /10! 


Section 4 


9a. 1+100x+ 4,950x? 
b. x°° — 50x y + 1,225x8y? 
ce. 1—70t+2,41527 
d. p” + np"1q + MD) pag 


lla. .737 
b. 1.010 
ce. 1.000 
d. .968 


24 
i (%) 
15. 36; 28 


Section 5 


Sa. 1/1,820 = .0005 
b. 24/455 = .053 
c. 64/455 = .141 

7. 1,260 


9. ve ' 

isa (7) /("7) 
» (S)(2)/(7) 
[S))+Ce)0) +(2)//(9) 
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CHAPTER 3 


Section 1 


joe 


Se mo RO & 
eS - 
WO NmMO hh LY 


1,2,3,4,5,6,7,8, 9} 


aes Qa ease 
pn 
© 
—— 


Co 


st moo oO O & 
my yo Oy 


Sq UP 

. not possible 

.P Sq 

. not possible 

. A= {d,e, f, g}; 
{b, d, g}; 

{a, b}; 


mn 
oe) 


Bye HY wx AL 
mic DOC Il 


p(B) =. 


p(B ON C) =.40; 
P(A NBN C)=.O0S; 
p(A U B) =.80 
7. 12 percent 
9. p(at least one 6 among 3 dice) = 91/216 = .421; 
p(at least one 6 among 4 dice) = 671/1,296 = .518 
11. 0.4 
13. 280 
15a. 750/1,326 = .566 
b. 8,852/22,100 = .401 
Ja. .2<p(A U B) S.3 
b. .5 <p(A UB) <.9 
c .7<p(A UB) S11 
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d. .9<p(A UB 
e. 95 <= p(A UB 
19a. 0 = p(A 2 B) 
B) 


~~” 


<= | 


b. 

ce. .3< p(A Nf B) 
d. B) 
e. .85 < p(A NB) S.9 


Section 2 
1. 30.8 percent 
3. .167/.667 = .250 
5. p(A|B) = p(A)/p(B) = p(A) since p(B) <1 and p(B) is positive. 


Equality holds if and only if p(A) = 0 or p(B) = 1 (A is impossible or B is 
certain). 


* (2)(3)/L(5)-(3)| 
"A2/\ 3 5 5 
9a. p(all aces) = 1/5,525 = .0002 
b. p(all aces| different suits) = 1/2,197 = .0005 
11. 9/230 = .039 
13. If p(B) = .9 and p(A) =.6, then 5/9 < p(A|B) < 6/9, or p(A|B) is 
between .555 and .667. If p(B) = .99, then 59/99 < p(A|B) < 60/99, or 
p(A|B) is between .584 and .594. 


Section 3 
1. 3.84 percent 
3. p(faulty stamp) = .07; 
p(dealer X|faulty stamp) = 4/7 = .571 
5, 24/59 = .407 
7. 4/13 = .308 
9. p(2-headed|three consecutive heads) = 8/17 = .471; 


p(head on 4th trial|three consecutive heads) = 25/34 = .735 
11. 9/19 = .473 


13. 208/2,210 = .094 


Section 4 


1. p(red card) = 26/52=%; p(ace) = 4/52 = 1/13; p(red ace) = 2/52= 
1/26. Since (1/2) (1/13) = 1/26, the events are independent. 

3. p(6 on first die) = 1/6; p(sum is 8) = 5/36; p(sum is 8 and a 6 is on first 
die) = 1/36. Since (1/6) (5/36) ¥ 1/36, the events are not independent. 

5. n(A)n(B) =n(S)n(A ON B) 

9a. 5/144 = .035 

b. 113/144 = .785 
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The assumption is that the events are independent. 
11. 1— (1—1/365)", or approximately 10/365 
13. 1—(.9)” 
15. p(A, U A, U Az) = py + pot P3— (PiP2+ Pips + PoPs) + PiPePs 
= p(A;) +p(Az) + p(A3) — [p(A1 MN Az) +p(Ai N Az) +p(Az O A;) | 
+ p(Ay N A, ON As) 
17. 192 
21. p(2 before 3) = 3[1— (3)°| = .491 
p(no 2 or 3) = (§) = .017 


Section 5 
9. No. It is necessary to know the value of p(B,)/p(B). 3 < p(A|B) S 4. 
If p(B,) = p(B,), then p(A|B) = 5/12. If p(B,) = 2p(B,), then p(A|B) = 
4/9. 


p(sum = 7) = .18; 

p(sum = 7 or 11) = .26; 

p(high die is 6) = .36; 

p(either a6 or a | appears) = .64 
15. p(52,t,) = .20 


P(S2, ts) = 12 
P(Ss, t,) — .25 
P(53, t2) = .10 


P(53; tz) =. I5 


CHAPTER 4 
Section | 
1. .299 < Pigg < .366 
3. n => 26 times. 
5. .4288 
7. Forn= 4, .1042. For n = 25, .4831 


9. .7135 
11. .8118 < Py) < .8642 
15a. 1.11 

b. .905 

c. 2.718 

d. .368 


Section 2 
1. The probability that A hits the target first is 


P . 
p+ p'— pp’ 
The probability that B hits the target first is 
p'(1~p) | 
p+ p'—pp' 
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3. A has the better chance of hitting the target first. His probability is 9/17. 


5. p = .50: probability = .6 
p = .49 at $1 per game: probability = .58 
p = .49 at $.10 per game: probability = .36 
p = .51 at $1 per game: probability = .62 
p = .51 at $.10 per game: probability = .80 


Section 3 

1. .581 

3. .794 

§. 3,168 

7. 2,200 
Ila. .3679 

b. .0613 

c. .6321 

d. 1.0000 
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Section 4 
1. 19/27 in both cases. A possible symmetry in the first case is: 


l 2 3 4 5 6 


er ee ee ee 
1 4 3 2 6 5 


A possible symmetry in the second case Is 


I 2 3 4 5 6 


rn ee 
5 2 3 4 1 6 


3. .088. A possible symmetry is 


1 2 3 4 5 6 
ee ee ee ee 
1 3 5 2 4 6 


5. 739/1,632 = .453 
7. 176/1,275 = .138 
9. .1617 
11. The probabilities of landing at Y and Z are equal. 


CHAPTER 5 


Section | 


oak 


moans 
iN 


n n 
5. S) x2%—-2a d x, +na 
i=1 


Section 2 


1. p(1) = .2; p(2) = .6; p(3) = .2 

3. p(2) = p(12) = a6 = .028; 
p(3) = p(11) = 4% = .056; 
p(4) = p(10) = 7 = .083; 
P(5) = p(9)=5=.111; 
p(6) = p(8) = 3% = .139; 
p(7) = 5=- .167 
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5. p(0) = p(5) = 32 = .031; 


p(1) = p(4) = 3 = .156; 

p(2) = p(3) = % = .313 
7. p(—-1) = 36 = .974; p35) = #%& = .026 
9. p(0) =4= .500 

p(1) = 7 = .286 

p(2) =7 = .143 

p(3) = * = .057 

p(4) = 7 = .014 


Section 3 


1. Average temperature = 57.5°. The median is 56.5°. 

3. Average value of x = 3.5. Average value of x? = 3 = 15.17. 

7. Average number of aces is #% = .154. The average number of diamonds 
iss ned 

9. The expected high die is ' = 4.47. The expected low die is 34 = 2.53. 


Section 4 
1. 15 
3a. 3.5 
b. 7 
c. 35 
5. 2 = 6.67 
7. 101 
9. E(X—1) =0;E(X—-1)?=1;E(2X4+1)? = 13 


Section 5 
1. $8,280 
. 2,401/1,296 = 1.85 
5. 14. For k heads, the average number of tosses 1s 


Cod 


k 
k+S i2k-1 = 2H 2. 


i=1 
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Section 6 


q(y) 


3. No. 


Section 7 


1. The value is 9/13 = .69. A’s correct strategy is 11/13H+ 2/137. This is 
also B’s correct strategy. 
3. The value is 3. A plays 7, and B plays H. 
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5. The value is 17/8 = 2.125. A’s correct strategy is to put out one finger 
with probability §, and two fingers with probability 4. This is also B’s 


correct strategy. 


Section 8 
5. 4.5 
CHAPTER 6 
Section 1 
lL. w= .5,0=.5 
3. w= p,o = p(|—p) =pq 
5. m= 75.9,5 = 13.7 
7. E(X?) — 30? — p3 
11. w= 4.96, m= 4.99; o = 1.15, 5 = 1.20 


Section 2 
1. p = tz = .9375 
3. Change the extreme temperatures to 61.6° and 88.4°. 
5.99 <x< 101 
7. probability = .75 


Section 3 
1 w=7,0 = 2.42 
3. w= 80,0 = 4 
5. w= 40,0 = 10.7 
7. 6.3 
11. Expected winnings = $200, standard deviation = $110. 


Section 4 
1. E(X) =4=.25;0(X) = V3/40 = .043 
5. 25 experiments; 40 experiments 
Ila. w= 1,0 = V2/2 =.707 
b. w= 1,o= V5/10 = .224 


Section 5 
1. Any No 2 pq/d*f 
3. N = 200,000 
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CHAPTER 7 


Section | 


la. .15 
b. .575 
ce. .175 


d. .175 
5. & = .417 


Section 2 


la. .0166 
b. .8849 
c. .0479 
d. .8414 
3. 9.670 <= x <= 10.330 with probability .9; 9.744 <x < 10.256 with 
probability .8. | 
5a. 16.2 
b. 61.04% passing grade; 77.28% for an A. 
7. q = 82; q = 98 for rare meetings. 


Section 3 
3. ./361 
5. Expected number = 210, standard deviation = 12.12 
7. .9298 


Section 4 


1. .7286 for 100 tosses; 
.8612 for 200 tosses; 
.9774 for 500 tosses 
3. .8230; 13 <n < 27 for probability larger than .90, 
12 28 for probability larger than .95, 
31 for probability larger than .99 


< 
=n 


5. .8790 

7. .2483 = probability that 10 or more favor B, 
.9452 = probability that 15 or fewer favor A 

9, .0516;N = 17 

11. 40% or higher with probability .0951; 50% or higher with probability 
.0003. 


Section 5 
3. At the 10% significance level, reject the hypothesis that the coin is fair 
if the number of heads is not between 42 and 58 inclusive. At the 2% 
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significance level, reject the hypothesis if the number of heads is not 
between 38 and 62 inclusive. 
5. The 90% confidence interval is 55.25% =< p 
The 98% confidence interval is 53.28% < p 
7. about 2,536 
9a. Reject hypothesis if 9 or more heads turn up. 
b. Reject hypothesis if 1 or no heads turn up. 

11. Since the number of aluminum nails should be 9 or less away from the 
mean 20 with probability .9476, an event with very small probability 
(.0524) has occurred. The president could claim a minor miracle, or he 
could check his 20% claim. 


4.75%. 
6.72%. 


IN IK 
na 


Section 6 
1. .0916 
3. .3935; about 250 shots 
5. .7127 


9b. .0025 


Aristotle, 15 
Average value, see Expectation 


Bayes’ Theorem, 100 
Bernoulli’s Theorem, 233 
Binomial coefficients, 64 
Binomial distribution, 252 
Binomial theorem, 63 
Births, by sex, 9 

Bridge, 78 


Chebyshev bounds, 220 
Chebyshev’s Theorem, 219 

Coin tossing, 25 

Combinations, 55 

Complement of sets, 82 
Conditional expectation, 180 
Conditional probability, 92 
Conditional relative frequency, 91 


D’Alembert, 27, 29 

Dependent events, 104 

Dice, high number thrown, 2, 3 
sum, 9, 23 


tossed till 6 occurs, 7, 8 
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Distribution, 159 
binomial, 252 
normal, 244 
Poisson, 270 

Division principle, 51 


Elementary event, 12 
Events, dependent, 104 
in a probability experiment, 17 
in a probability space, 18 
independent, 104—105 
mutually exclusive, 83 
probability of, 18 
relative frequency of, 17 
Expectation, 168 
conditional, 180 
units of, 177 
Expected value, see Expectation 


Finite sets as sample space, 32 
Frequency, 4 


Games, theory of, 197 
Histograms, 161 


Inclusion-exclusion principle, 140 
Independence of events, 104 

of random variables, 188 
Independent trials, 226 
Intersection of sets, 82 


Joint distribution, 186 


Law of averages, 6 
Law of Large Numbers, 232 
Limit, defined, 129 


Mean value, see Expectation 
Median, 171 

Miracles, 7 

Multinomial Theorem, 72 
Multiplication principle, 45—46 


Normal distribution, 244 


Partitions, 114 


INDEX 


Pascal’s triangle, 62 
Permutations, 54 
Poisson distribution, 270 
Poker, 76 
Polling, 27, 254 
Population, 52 
Probability, conditional, 92 
definition, 12 
of an event, 18 
statistical definition, 5 
Probability density, 237 
See also Distribution 
Probability experiment, 3 
Probability space, definition, 12 
infinite discrete, 33 
uniform, 14 
Product probability space, 115 
Product rule, 97 
Product set, 38-39 


Random variables, definition, 158 
distribution of, 159 
expectation of, 168 
independence of, 188 
product of, 175 
standard deviation of, 211 
sum of, 175 
variance of, 208 

Random walk, 132-135, 137 

Relative frequency, 4 
conditional, 91 
of an event, 18 


Sample mean, 213 
Sample point, 12 
Sample space, 12 
See also Probability space 

Sample standard deviation, 213 
Sample variance, 213 
Samples, classified, 52—53 

number of unordered, 67 
Significance level, 263 
Standard deviation, 211 

units of, 211 
Symmetry, definition, 147 

effect on random variables, 201 
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Tree diagrams, 98 
Triple plays, 218 


Uniform probability space, 14 
probability in, 19 

Union of sets, 82 

Urns, 28 


Variance, 208 
units of, 210 


